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1.  INTRODUCTION 


1.1  STATEMENT  OF  THE  PROBLEM 

The  objective  of  this  study,  conducted  under  Contract  No.  F30602-78-C- 
0271,  is  to  determine  a  modular  architecture  for  a  multiprocessor  system  to 
compute  adaptive  weights  to  maximize  the  signal-to-noise  ratio  (SNR)  of  a 
multichannel  input  radar  system  with  a  large  number  of  weights.  The  radar 
system  to  be  considered  has  channel  inputs  in.  all  three  signal  domains-- 
spatial,  temporal,  and  polarization--with  multiple  inputs  in  each  domain. 

The  adaptive  system  must  sense  the  environment  and  adjust  at  least  200  complex 
weights  imposed  on  the  array  elements,  on  the  delay  taps  off  each  element, 
and  on  the  polarization  sensors,  thus  maximizing  the  signal-to-noise  ratio. 
This  sensing  must  be  accomplished  on  a  time-varying  basis,  as  the  environment 
dictates,  with  minimum  convergence  time,  minimum  noise,  and  maximum  stability. 

For  a  multiprocessor  to  be  able  to  calculate  at  least  200  complex 
weights  quickly  enough  to  adapt  to  the  changing  environment,  not  only  must  a 
fast  adaptive  algorithm  and  a  fast  processor  system  be  chosen,  but  the 
algorithm  chosen  must  itself  be  well  suited  to  the  processor  system.  The 
algorithms  to  be  explored  are  based  on  the  Gram-Schmidt  orthogonal iza  ion 
process.  A  modular  architecture  to  support  these  algorithms  is  designed. 

1.2  RESTATEMENT  OF  THE  PROBLEM  IN  MATHEMATICAL  TERMS 

Each  antenna  element  in  an  adaptive  array  produces  an  input  signal, 
which  may  be  written 


Xj(t)  =  Nj(t)  +  Sj (t) ,  j=l , . . .  ,n 


(1) 


7 


where  X  is  the  complex  waveform  as  a  function  of  time  t,  N  is  the  noise 
component,  S  is  the  signal  component,  the  subscript  j  designates  the  parti¬ 
cular  antenna  element,  and  n  is  the  total  number  of  antenna  elements.  X,  N, 
and  S  may  be  thought  of  as  complex  column  vectors  of  their  components: 

'  xn (t) 

x2(t) 

X(t)  =  .  .  (2) 

.  x"(t)  . 

Instead  of  these  continuous  waveforms,  we  will  usually  deal  with  the  sample 
vectors  X^,  N^,  and  S ^ ,  defined  by: 

X(l)  =  X(ti)  ,  (3) 

where  the  t^  are  closely  spaced  points  in  time.  Let  W  be  a  column  vector  of 
complex  numbers.  We  form  the  filter  function 

F  =  W*X  (4) 

as  the  dot  product  of  W  and  the  input  vector  X  (where  W*  =  W  conjugate 
transpose). 

Our  problem  is  to  choose  the  weights  W  to  optimize  the  SNR  of  the 
system,  and  thereby  maximize  the  probability  of  detection  of  the  signal  in 
the  presence  of  jammers  and  clutter.  These  weights  are  a  function  of 
the  sequence  of  sample  input  vectors  X^.  This  report  explores  not  only 
different  algorithms  for  calculating  the  weights  W  through  orthogonal izing 
the  inputs  X^  and  filter  function  F,  but  also  the  effects  of  different 


modular  parallel  computer  architectures  on  the  implementations  of  these 
algorithms.  These  effects  include  execution  speed,  accuracy,  convergence 
rates,  and  fault  tolerance. 


1.3  APPROACH 

Let  N  be  the  column  vector  of  noise  inputs  as.  as  in  Section  1. 
Assuming  the  n  components  N..  have  a  multivariate  Gaussian  distribution 
can  write  their  covariance  matrix  as 

M  =  E(NN*)  . 

★ 

It  has  been  shown  [Brennan  and  Reed  1973]  that  the  optimal  weights  W 
given  by 

W  =  kM^S 

where  S  is  the  expected  signal  vector  (called  steering  vector)  and  k  i 
nonzero  constant  (normally  1). 

M  is  not  known,  but  must  be  estimated  from  sample  data 

M  =  I  £  x<J>  x(j)*  , 

5  j-1 


L.  E.  Brennan  and  I.  S.  Reed,  "Theory  of  Adaptive  Radar,"  IEEE 
on  Aerospace  and  Electronic  Systems,  Vol.  AES-9,  No.  2,  1973,  pp.  237- 


2. 

,  we 

(5) 

are 

(6) 

s  any 


(7) 


Trans. 
252T- 


where  are  sample  voltage  vectors  and  S  is  the  number  of  samples.  M  is 
an  n  x  n  dense  positive  definite  Hermitian  matrix  [Liles  and  Demmel  1978],* 

A  A 

and  so  forming  M  as  in  Eq.  (7)  and  then  computing  W  from  Eq.  (6)  with  M  in 
place  of  M  takes  a  great  deal  of  computation. 

Now  let  T  be  a  given  linear  transformation.  Then  the  covariance 
matrix  My  of  the  random  vector  TN  is  given  by 

My  +  E [TN (TN )*]  =  TE(NN*)T*  =  TMT*  .  (8) 

Thus,  we  have 

W  =  M-1S  =  T*Mj]TS  .  (9) 

If  T  can  be  chosen  so  that  TS  is  easily  computable,  and  My  is  diagonal, 
we  will  have  simplified  our  problem.  Such  a  T  is  given  by  the  Gram-Schmidt 
orthogonal izati on  process  and  produces  a  lower  triangular  matrix  T  such  that 
the  entries  of  the  random  vector  TX  are  orthogonal. 

The  basic  implementation  of  the  processor  for  computing  T  and  TX  is 
shown  in  Figure  1  .  Each  processor  is  very  simple,  and  all  are  identical. 

The  array  operates  with  each  row  working  in  parallel  on  an  intermediate  step 
of  TX,  and  with  different  rows  working  as  stages  in  a  pipeline  to  compute  TX 
for  different  X's.  The  array  overlaps  its  updates  of  its  own  internal 
coefficients  w-j  with  computing  TX  for  different  values  of  X. 

if 

W.  C.  Liles  and  J.  Demmel,  "Solving  Large  Positive  Definite 
Hermitian  Linear  Systems  Utilizing  Parallel/Pipeline  Processors,"  Proceedings 
of  the  1978  International  Conference  on  Parallel  Processing,  IEEE,  1978, 
pp7T61-262. -  - 


Our  approach  is  to  analyze  this  basic  scheme  and  its  variations  with 
respect  to  the  following  factors: 

1.  Variations  on  the  basic  mathematical  algorithm.  Section  2. 

We  consider  their  different  hardware  implementations  (e.g., 
cost  and  reliability  properties)  and  stability  (e.g.,  the 
number  of  bits  required  for  intermediate  steps). 

2.  Different  ways  of  computing  the  internal  coefficients,  w. . 
Section  3.1.  The  different  algorithms  are  analyzed  with1J’ 
respect  to  their  hardware  implementations,  number  of  samples 
required  for  convergence,  and  sensitivity  to  the  number  of 
jammers.  Simulations  were  performed. 

3.  Error  sensitivity  and  fault  tolerance,  Section  3.2.  Both 
theoretical  and  simulation  results  are  presented  on  the 
effects  of  catastrophic  and  gradual  failures  of  processors 
and  receiver/antenna  systems. 

4.  Different  ways  to  perform  arithmetic.  Section  3.3.  We 
consider  both  fixed-  and  floating-point  implementations,  and 
try  to  derive  probabilistic  bounds  on  the  growth  of  inter¬ 
mediate  results  during  the  computations. 

5.  Implementation  with  fewer  processors.  Section  3.4.  We 
consider  speed,  cost,  and  reliability  tradeoffs  when  using 
only  n  processors  instead  of  OCn^ )  implementations  of 
Figure  1. 

6.  Individual  processor  constructions,  Section  3.5.  The 
different  ways  of  implementing  each  processor  P..  are 
considered  with  respect  to  speed  and  hardware  complexity. 

7.  Radar  engineering  considerations,  Section  3.6.  We  indicate 
how  different  system  parameters  might  affect  an  engineer's 
decision  of  what  version  of  the  Gram-Schmidt  algorithm  to 
implement. 

Finally,  we  select  a  typical  set  of  radar  system  parameters  and  give  a 
detailed  hardware  design  for  a  Gram-Schmidt  processor  incorporating  those 
parameters  (see  Section  3.7). 


2.  DESCRIPTION  OF  THE  ALGORITHM 

2,1  INTRODUCTION 

An  n-element  radar  system  has  inputs  X..,  1  <  i  £  n,  where  each  component 
X.j  is  the  sum  of  noise  N..  and  signal  S^,  expressed  in  column  vector  notation  as 

X  =  N  +  S  .  (10) 

The  output  of  the  system  is  the  filter  function,  F,  given  by 

F  =  W*X  ,  (11) 

where  W  is  a  column  vector  of  complex  weights  and  *  denotes  conjugate  transpose. 
Brennan  and  Reed  [1973]  have  established  that  in  order  to  maximize  the 
signal -to-noise  ratio  (SNR)  of  the  system,  W  should  satisfy 

MW  =  S  ,  (12) 

where  M  is  the  coyariance  matrix  of  the  noise 

"u  ■  E<¥j>  •  (,3> 

In  a  real  system,  Mis  not  known  and  must  be  estimated  from  incoming  samples  X. 
This  section  describes  an  alaorithm  based  on  Qram-Schmidt  (G-S)  orthogonal izati on 
to  form  M,  solve  the  system  of  simultaneous  linear  eouations  determining  W, 

(Fq.  (12)),  and  compute  F. 

The  literature  contains  both  practical  and  highly  theoretical  results 
on  solving  systems  of  equations.  Csanky's  algorithm  [1976]*  requires  the 

L.  Csanky,  "Fast  Parallel  Matrix  Inversion  Algorithms,"  SIAM  J.  on 
Computing,  Vol.  5,  1976,  pp.  618-623. 
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least  number  of  operations  of  any  algorithm  [0(log  n)J  but  is  impractical 
because  it  requires  too  many  processors  connected  in  a  complicated  way,  and 
is  also  numerically  unstable.  Gentlem^i  £1978]  has  given  bounds  on  the  time 
spent  routing  data  between  processor^  during  Gaussian  Elimination.  Several 
investigators  have  benchmarked  various  algorithms  cn  different  machines 
[Calahan  et  al.  1976;  Liles  and  Derrnel  1978;  Blakely  1977;  Berra  and  Singhania 

"felt  'Jc'JcIc 

1976;  Sameh  and  Kuck  1975].  Sameh  and  Kuck  [1975,  1977a,  1977b,  1978] 
have  published  survey  articles.  Kung  and  Leiserson  [1978]+  have  produced 
algorithms  emphasizing  simple  processors  and  simple  interconnect  structures. 

The  matrix  M  in  our  problem  has  two  very  nice  properties  which  make 
solving  Eq.(12)  easier  than  in  the  case  of  a  general  matrix:  It  is  Hermitian  (i.e., 


W.  M.  Gentleman,  "Some  Complexity  Results  for  Matrix  Computations  on 
Parallel  Processors,"  Journal  of  the  Association  for  Computing  Machinery, 

Vol .  25,  No.  1,  January  1976,  pp.  1 12-11  !>. 

**D.  A.  Calahan-,  W.  N.  Joy,  and  D.  A.  Orbits,  Preliminary  Report  on 
Results  of  Matrix  Benchmarks  on  Vector  Processors,  Systems  Engineering 
Laboratory,  SEL  Report  No  94,  University  of  Michigan,  Ann  Arbor,  May  24,  1976; 
C.  Blakely,  "PEPE  Application  to  BMD  Systems,"  Proceedings  of  the  1977 
International  Conference  on  Parallel  Processing,  August  26-27,  1977,  pp.  1 93- 
198;  P.  B.  Berra  and  A.  K.  Singhania,  TimingMgures  for  Inverting  Large 
Matrices  Containing  Complex  Numbers  Using  the  Staran  Associative  Processor, 

Rome  Mr  Development  Center,  RADC-TR-76-339,  Griffiss  Air  Force  Base,  New  York , 
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M  =  M*)  and  positive  definite  (i.e.,  all  its  eigenvalues  are  positive,  or, 
equivalently,  Z*MZ  >  0  for  all  nonzero  vectors  Z)  [Liles  and  Demmel  1978]. 

Positive  definite  Hermitian  matrices  arise  frequently  in  numerical 
work,  often  implicitly  as  the  sample  covariance  matrix  of  a  set  of  sample 
vectors.  This  is  the  situation  not  only  in  the  adaptive  signal  processing 
problem  which  motivated  this  study,  but  in  many  regression  and  least  squares 
problems.  For  example,  solving  the  system  AX  =  B  in  a  least  squares  sense  is 
equivalent  to  solving  the  normal  equations  (A*A)X  =  A*B,  where  A*A  is  positive 
definite  and  Hermitian  (if  A  has  full  rank  and  where  A*  *  A  conjugate  trans¬ 
pose).  Positive  definite  Hermitian  matrices  also  arise  explicitly  in  varia¬ 
tional  problems,  such  as  solving  self-adjoint  differential  equations  with 
finite  element  methods. 

The  remainder  of  this  section  is  organized  as  follows:  Section  2.2 
gives  a  statement  of  the  problem  and  the  approach  to  the  problem;  Section  2.3 
presents  an  implementation  of  TS;  Section  2.4  computes  the  w^'s;  Section  2.5 
gives  an  implementation  of  Mj^(TS);  Section  2.6  presents  an  implementation 
of  T*(Mj  TS);  and  Section  2.7  describes  possible  simplification  of  the 
implementation  in  the  case  of  a  side-lobe  canceller  (SLC).  Appendix  C 
presents  a  numerical  example  of  how  to  use  the  G-S  array  to  solve  a  system 
of  simultaneous  linear  equations. 
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2.2  STATEMENT  OF  THE  PROBLEM  AND  APPROACH 

Let  the  system  to  be  solved  be  MW  =  S,  where  M  is  a  positive  definite 
Hermitian  n  x  n  matrix,  S  is  an  n  x  1  vector,  and  W  is  an  n  x  1  vector  of 
unknowns.  If  T  is  a  nonsingular  n  x  n  matrix  (to  be  specified  later),  define 
MT  =  TMT*,  where  T*  =  (T)T.  Then  My  is  also  positive  definite  Hermitian  (by 
Sylvester's  law  of  inertia)  and  W  =  M  =  T*MyHs.  By  choosing  T  so  that 

T,  My\  and  T*  are  easily  computable,  we  will  have  simplified  our  problem. 

A  good  choice  of  T  is  given  by  the  Gram-Schmidt  orthogonal ization  process  set 
forth  in  Eq.  (14)  [Rice  1966]*,  where  <a,  b>M  denotes  the  complex  inner  pro¬ 
duct  induced  by  the  matrix  M:  <a,  b>M  denotes  the  complex  inner  product 
induced  by  the  matrix  M:  <a,  b>M  =  b*Ma . 


=  $ 
i  - 


zr  ■  zi 


z  = 
zj  -  zj 


(input) 
ji 


<Zj-ZJ>H  ,1 


1  1  i  1  n  "  1 »  i  <  j  1  n  l 


1  M 


(output: 


TS) 


(14) 


The  coefficients  w^j  =  <z!j,  zl  >  ^/-cZl ,  Z^  are  obtained  by  performing 
Gram-Schmidt  orthogonal  ization  on  the  standard  basis  e1^,  1  £  j  ±  n,  where 
e1?  =  (1  if  1  =  j  and  0  otherwise}  with  the  inner  product  <.  ,  .>M  .  Hence, 
the  vectors  e^  are  orthogonal  with  respect  to  the  inner  product  induced  by  My, 

vrfrich  impl les  MT  is  a  diagonal  matrix,  since  <e^ ,  e^>M  =  MT  *  0  unless  i  *  j. 

i  i  T  ij 

Also,  My  =  <e  ,  e  >M  =  <Z^ ,  Z^. 


J.  R.  Rice,  "Experiments  on  Gram-Schmidt  Orthogonalization,"  Math. 
Comput. ,  20  April  1966,  pp.  325-328. 
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The  T  induced  by  Eq. (14)  is  unit  lower  triangular  (lower  triangular 
with  ones  on  the  diagonal),  since  Zj  =  Sj  plus  a  linear  combination  of  S-j 
through  S j_-j .  Hence  T”1  is  also  unit  lower  triangular,  and  both  T*  and  (T*)”1 
are  unit  upper  triangular.  Since  M  =  T‘^My(T"^)*,  we  have  written  M  as  the 
product  of  a  unit  lower  triangular  matrix  L  =  T”\  a  diagonal  matrix  D  =  My, 
and  L*  =  (T"l)*;  M  =  LDL*.  This  is  the  same  decomposition  provided  by  the 
Cholesky  algorithm  (without  square  roots  [Liles  and  Demmel  1978]),  and  because 
the  Cholesky  decomposition  is  unique,  (T_1 )  and  My  are  the  matrices  that 
would  be  produced  by  the  Cholesky  algorithm.  In  fact,  we  show  in  Sub¬ 
section  2.6.2  that  the  coefficients  w^  are  the  entries  of  the  matrix  L. 
Viewing  M  as  the  covariance  matrix  of  a  column  vector  V  of  random 

variables  with  a  multivariate  normal  distribution  with  zero  means  (which  is 

u 

computed  as  the  average  of  the  covariances  of  s  sample  vectors  V  ,  1  <  k  £  s: 

J,  _ 

j  s  E(V.Vj)),MT  is  then  the  covariance  matrix  of  the  random 

vector  Z  =  TV: 


MT  =  E(ZZ*) 

=  E [TV (TV )* ] 

=  E(TVV*T*) 

=  TE(VV*)T*  since  T  is  constant 
=  TMT* 


Thus  MT  diagonal  means  Z.  and  Z.  are  uncorrelated  (orthogonal).  This  statis- 
tical  interpretation  helps  us  recognize  that  this  method  may  be  used  to  solve 
least  squares  and  regression  problems. 

The  variation  on  Eq.  (14)  given  in  Eq.  (15)  uses  a  nonunit  lower  triangular 
matrix  T  and  corresponds  to  the  Cholesky  decomposition  M  =  LL*.  This  variation 
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has  a  different  implementation  and  different  round  off  and  stability  properties, 
which  will  be  discussed  later. 


Z1^  =  s 
L  J  '  bJ 


(input) 

Z]  -  z,J/«z,J.z,J>m)h 


1  <  i  <  n,  i  <  j  <  n 


7ii+l  _  7ii  _  <7*^  7^>7^ 

1  j  'l  t  1  y  Zi  MZi 


Z..  =  zj  (output:  Z  =  TS) 


(15) 


Because  of  the  normalization  in  the  second  line  of  Eq.  (15),  Mj_  c  = 

Hence,  Mj  s  I  =  identity  matrix.  Writing  L  ■=  T  ^  as  before,  we  now  have  M  =  LL*, 

the  Cholesky  decomposition  (with  square  roots).  As  before,  the  coefficients 

w1. .  =  {<Z'?,Z1*>  if  i  f  j;  1/ (<Z ' !  ,Z'!>)^  if  i=j}  are  the  entries  of  the  Cholesky 
1 J  J  11 

factor  L.  The  coefficients  from  Eq.  (15)  are  related  to  the  coefficients  from 
Eq.  (14)  by  wfj  =  w'j  •  for  ift.  If  V  is  a  column  vector  of  random  variables 

as  before,  then  the  entries  of  Z  =  TV  are  independent  Gaussian  random  variables 

with  unit  variances. 


In  summary,  the  algorithm  to  solve  MW  =  S  is: 

1.  Compute  the  w.-j's,  either  directly  from  the  entries  of  the  M 
matrix,  or  from  sample  vectors. 

2.  Compute  Mj\ 

3.  Compute  TS  . 

4.  Compute  (TS). 

5.  Compute  T*(Mj^TS) . 


18 


2.3  IMPLEMENTATION  OF  TS 


The  n(n  -  l)/2  processors  operate  as  follows:  The  processors  in  row  i  operate 
simultaneously  on  values  passed  to  them  by  the  row  above.  Z^+1  is  passed 
vertically  from  P..  to  P.+1  . ,  and  the  rightmost  processor  P.  .  ,  ("diagonal" 
processor)  broadcasts  zj*j  to  all  the  processors  in  the  next  row  simulta- 
eously.  All  processors  operate  simultaneously  and  in  lockstep,  each  row 
performing  its  operations  on  the  intermediate  results  of  a  different  input 
vector  in  one  unit  time  step.  Different  S  vectors  may  be  piped  into  the  top 
of  the  array,  and  a  new  vector  entered  each  unit  time  step. 

A  timing  diagram  is  shr-n  in  Figure  3  for  the  case  n=4.  Superscripts 
indicate  different  vectors;  subscripts,  different  components.  The  horizontal 
axis  is  labelled  in  unit  time  steps.  The  top  four  rows  indicate  when  data  should 
be  submitted  to  the  input  ports  at  the  top  of  the  array,  and  the  bottom  four 
rows  indicate  when  the  components  of  the  output  vectors  ZJ  =  TSJ  are  available 
at  the  output  ports  at  the  right  of  the  array.  One  can  see  that  the  m  pro¬ 
ducts  ZJ  =  TSJ ,  1  <  j  <  m,  can  be  formed  in  n+m  time  steps. 

This  implementation  has  the  following  advantages:  It  has  a  simple 
interconnect  structure,  with  unidirectional  and  fixed-destination  data  flow; 
and  the  processors  are  very  simple,  identical,  and  independent  of  n. 
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INPUT  PORTS 


Z,  OUTPUT 
2  PORTS 


(HORIZONTAL  MOVEMENT  OF  zj  IS  A 
BROADCAST  THROUGH  THE  WHOLE  ROW) 


Figure  2.  Processor  array  for  computing  TS. 
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Figure  3.  Timing  diagram  for  computing  TS. 
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Also,  the  processor  array  works  without  modification  on  vectors  of  size  less 
than  n. 

In  the  case  of  Eq . (15)  (with  square  roots)  of  Section  2.2,  the  only 
difference  in  implementation  is  that  there  must  be  a  processor  to  divide  Z'| 
by  its  norm  (< Z 1 1  ,Z' before  broadcasting  it. 
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2.4  COMPUTATION  OF  THE  w^'s 

The  w..'s  are  computed  in  two  ways,  depending  on  whether  the  matrix  M 
1 3 

is  known  or  must  be  estimated  from  vectors  sampled  from  the  underlying 
distribution.  It  turns  out  that  the  two  methods  require  similar  hardware. 

We  first  discuss  the  algorithm  without  square  roots  (Eq.  (14)). 

If  the  matrix  M  is  available,  the  w^'s  are  computed  by  pipelining  the 
columns  of  M  into  the  top  of  the  processor  array,  and  performing  the  compu¬ 
tations  indicated  in  Figure  4.  Z.(k)  denotes  the  intermediate  result 

J 

from  the  k^  column  of  the  matrix  M.  The  IF-THEN-ELSE  in  Figure  4  may  be 
implemented  by  having  a  control  line  for  each  row  of  the  array:  During  normal 
operation  (computing  TS,  or  the  ELSE  clause  in  Figure  4)  the  control  signal 
is  off,  and  when  the  data  from  column  i  has  finally  reached  row  i  in  the  array 
the  control  signal  should  be  turned  on  to  tell  the  processors  to  compute  w^. 

A  timing  diagram  for  the  complete  implementation  appears  later,  in  Figures 
7  and  9  (Section  2.6). 

The  configuration  just  described  requires  a  divide  unit  in  each 

processor.  Because  each  processor  in  row  i  computes  the  same  reciprocal 
1»/Z^(i),  the  diagonal  processor,  . ,  can  also  compute  this  value 

and  broadcast  it  along  with  zl ( i )  itself.  This  approach  simplifies  all  but 
n  -  1  of  the  processors,  but  requires  more  data  to  be  broadcast. 

Either  configuration  can  compute  all  the  w. ,'s  in  n+1  unit  time 

*  J 

steps. 

If  sample  vectors  are  known  instead  of  the  actual  matrix,  the  method 
is  similar.  If  s  sample  vectors  are  given,  all  s  must  be  passed  through 
the  array  n-1  times,  performing  the  computations  shown  in  Figure  5a. 

Each  time  the  s  samples  are  passed  through,  another  row  of  w. .'s  is  calculated 

*  J 
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Figure  5a.  Calculating  w.^  with  sample  vectors 
as  input  (without  square  roots). 
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Figure  5b.  Calculating  w^  with  sample  vectors 
as  input  (with  square  roots). 
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In  the  expression  Zj(k)  the  k  denotes  which  row  of  w^.'s  is  being  calculated. 
A  total  of  C (s  +  1 ) (n  -  1)  +  1]  tine  steps  are  required  to  compute  all  the 


As  in  the  case  where  the  matrix  M  was  input,  an  alternate  configuration 
is  to  have  only  the  diagonal  processor  compute  w^.  =  w^  +  Ij(k)  ’  and 

1/w..  and  broadcast  1/w.. 

*  J  '  J 

The  implementation  of  the  algorithm  with  square  roots  (Eq.  (15))  is 
similar  and  is  shown  in  Figures  4b  and  5b. 

Since  each  processor  in  row  i  needs  the  same  quantity  w!.  (or  its 
reciprocal),  it  is  possible  to  have  a  diagonal  processor  P...  to  calculate  this 
value  only  and  broadcast  it  (along  with  z|(k),  in  the  case  of  Figure  5b). 

This  possibility  increases  communication  needs  slightly  but  makes  each  P^. 
much  simpler. 

Other  methods  exist  for  computing  the  w's  when  sample  vectors  are 

1  J 

used;  they  are  variations  on  the  general  scheme  shown  above  and  are 
discussed  in  Section  3.1. 


i 

t 
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2.5  IMPLEMENTATION  OF  M^(TS) 

"1  11^1 

Because  My  is  diagonal,  My  =  diag(<Z^ ,Z^>^  ),  and  these  values  are 

i  i  i 

available  in  any  processor  (1  ./<Zj  .Z^  is  in  Pj  for  all  j),  it  is  easy  to 
compute  My^(TS):  Just  attach  a  multiply  unit  to  each  output  port  on  the  right 
of  the  array  and  multiply  the  output  at  port  i  by  the  value  of  <zj,z|>j^ 
borrowed  from  the  processor  just  to  the  left,  p’.  When  the  algorithm  with 
square  roots  is  used.  My  is  the  identity  matrix  and  so  nothing  needs  to  be 
done. 
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2.6  IMPLEMENTATION  OF  T*(M~]TS) 

This  step  of  the  algorithm,  the  formation  of  T*(M^TS),  has  three  differ¬ 
ent  implementations.  The  first,  the  unit  vector  method,  uses  the  array  shown 
in  Figure  2  (Section  2.3)  with  n  additional  processors  on  the  right.  It  can 
solve  the  m  problems  MW1  =  S1 ,  1  £  i  £  m,  in  time  0(mn)  with  high  efficiency 

E  =  processor  on-ttme/(total  time  •  total  processors) 

=  (m/m+1 )  [(n+1)2  -  2)/(n+l)2] 

The  second  method,  the  reverse  flow  method,  requires  some  chanqes  to  the 
array  but  preserves  the  simple  interconnect  scheme.  It  can  solve  the  m 
problems  in  time  0(m+n)  but  with  somewhat  lower  efficiency 

E  =  m/[m  +  2 ( n+1 ) ] 

The  third  method,  the  transform  space  method,  is  essentially  different  from  the 
other  two  in  its  use  of  the  array  to  compute  the  filter  function,  F,  directly. 

The  first  two  methods  compute  the  weights,  W,  which  must  then  be  used  (by  another 
piece  of  hardware)  to  compute  F. 

Overall  timings  of  the  different  implementations  are  given.  A  more 
detailed  comparison  of  the  unit  vector  and  reverse  flow  methods  is  given  in 
Appendix  F. 

2.6.1  Unit  Vector  Method 

The  unit  vector  method  forms  T*(Mj^TS)  with  a  vertically  connected 
array  of  n  simple  processors  connected  to  the  output  ports  of  the  array  as 
in  Figure  6.  First,  the  vector  S  is  passed  through  the  array,  and  the  n 
numbers  Y.  =  (mZ^Ts)  .  are  formed  and  stored  in  registers  inside  the 

J  *  vJ 

processing  elements  ( P E' ' s ) .  Then, the  n  unit  vector  EJ  (E^  =  1  if  i  =  j  and  0 
otherwise)  are  piped  through  the  array  to  form  the  n  products  TE"1.  Given  that 
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(T*Y)  =  Z  T  Y  =  t  W)k\ 

J  k=l  kJ  K  k=l  K  ' 


and  that  terms  (TE^ and  Yk  are  both  available  at  the  kth  output  port, 
these  terms  may  be  multiplied,  added  to  a  partial  sum  computed  by  PE^  ^ 
during  the  preceding  time  step,  and  passed  to  PEk+i  for  the  next  time  step, 
as  shown  in  Figure  6. 

A  timing  diagram  for  the  entire  process  of  solving  MW1  =  S1  for 
n  =  3  and  1  <  i  <  2  is  shown  in  Figure  7.  Yj  denotes  (Mj^TS1).,  and  d! 
denotes  the  output  of  PE.  with  input  (TE1).. 

J  J 

A  timing  analysis  may  be  made  easily  from  the  timing  diagram.  A 
total  of  n  time  steps  are  required  to  compute  the  w^'s.  Not  counting  these 
time  steps,  a  total  of  Tyy  =  (m+l)(n+l)  units  of  time  are  required  to  pass  m 
vectors  S1  and  mn  unit  vectors  EJ  through  the  array.  We  may  compute  the 
efficiency  of  the  array  where  Tp  is  the  time  processor  p  is  on,  P  is  the  number 
of  processors,  and  T  is  total  time  required  to  finish  processing,  as  follows: 


EUV 


2L  Tp/(T*P) 

all  processors 


Each  of  the  n(n-l)/2  processors  in  the  array  has  Tp  =  m(n+l),  and  each  of 
the  n  outboard  PE'S  has  Tp  =  mn ;  therefore. 


E 


UV 


SituLjj. 

(n+1  ) 


Thus,  Eyy  approaches  1  as  m  and  n  grow.  The  unit  vector  method  is  thus 
an  efficient  implementation,  and  it  is  the  same  for  both  the  with-souare->-oo*s 
and  without-square-roots  versions  of  the  algorithm. 
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2.6.2  Reverse  Flow  Method 

Ttie  reverse  flow  method  depends  on  the  interesting  fact  that  the  array 
can  compute  T*Y  by  inputting  the  components  of  the  vector  Y  at  the  right 
side  of  the  array,  performing  all  the  computations  in  the  reverse  order 
(using  instead  of  w^.),  and  extracting  the  outputs  at  the  top  of  the  array. 
This  process  is  illustrated  in  Figure  8.  Mow  the  broadcasts  take  place 
in  the  vertical  direction,  whereas  when  computing  TS  the  broadcasts  were 
in  the  horizontal  direction. 

A  proof  of  this  fact  for  the  algorithm  without  square  roots  (illus¬ 
trated  here  for  n  =  4)  depends  on  the  fact  that  T  can  be  written 
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The  i*"11  matrix  in  this  product  represents  the  computations  oerformed  by 
the  (n-i)^  row  of  the  array.  Hence, 

T*  =  [(T*)'1]'1  =  [(T'Vr1 
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DATA  FLOW  FOR  COMPUTING  T* 


Figure  8.  Reverse  flow  method. 
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The  icr  matrix  in  this  product  for  T*  represents  the  computations  performed 

4*  h 

by  the  (n-i)  n  column  of  the  array  during  reverse  flow. 

The  same  type  of  argument  shows  that  the  w^  are  the  entries  of  the 
matrix  L  in  the  Cholesky  factorization  M  =  LDl*.  Recall  from  Section  2.2 
that  L  =  T'1 


35 


Vwh 

w12  Vw  22 

W1 3  w23  ]/w33 

w14  w24  w34  1/w44 

so  that  the  w_.j's  are  the  entries  of  the  Cholesky  factor  L  in  M  =  LL*, 
except  that  the  w. .‘s  are  the  reciprocals  of  the  diagonal  elements  of  L. 

The  same  lockstep  pipelined  operation  for  data  movement  that  was  used 
for  T  is  used  for  T*,  except  backwards.  A  timing  diagram  for  the  entire 
operation  of  solving  MW1  =  S1  for  n  =  3  and  1  i  <_  4  i s  shown  in  Figure  9. 

A  shift  register  of  length  n-j  is  required  at  output  port  j  in  order  to 
resubmit  the  components  of  the  vector  Mj^TS  to  the  array  simultaneously. 

In  Figure  9,  Y^  denotes  Mj^TS^  and  the  solution. 

As  before,  we  analyze  the  timing  excluding  the  time  steps 
required  to  compute  the  w^.'s.  If  m  vectors  S1  are  to  be  input  at  the  top, 
the  total  time  required  is  m  (to  input  the  S^s)  plus  n  (to  compute  TS1) 
plus  1  (to  compute  Mj^TS1)  plus  1  (to  resubmit  Y1 )  plus  n  (to  compute  T*Y): 

T^p  =  m  +  2(n  +  1).  =  0(m  +  n) 

This  relation  is  to  be  contrasted  with  the  T ^  =  (m  +  1 ) ( n  +  1)  for  the  unit 
vector  method.  Analyzing  efficiency  as  before,  but  interpreting  P . .  as  two 
processors,  one  for  T  and  one  for  T*,  we  have  n(n  -  1)  processors  being  used 
for  m  time  units  and  n  processors  for  m  units,  for  an  efficiency 


erf  =  m/[m  +  2(n  +  1 )] 


This  efficiency  function  has  a  behavior  different  from  Eyy  for  the  unit  vector 
method:  EPP  is  low  for  small  m  and  large  n,  whereas  EMU  was  an  increasing 
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Figure  9.  Timing  diagram  for  the  reverse  flow  method. 
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function  of  both  m  and  n.  This  is  a  typical  speed/efficiency  design  tradeoff 
often  encountered  when  designing  parallel  systems  [Chen  1971a;  1971b]. 

2.6.3  Forming  the  Filter  Function  in  Transform  Space 

The  ultimate  output  of  the  system  under  consideration  is  the  filter 
function 


F  =  W*X 

where  X  is  a  column  vector  of  receiver  inputs.  The  algorithm  discussed  in 
the  preceding  two  subsections  (2.6.1  and  2.6.2)  computes  W,  assuming  another 
piece  of  hardware  is  available  to  form  the  inner  product  W*X.  It  is  possible 
to  avoid  computing  W  itself,  and  to  form  the  filter  function  as  follows: 

F  *  W*X 

=  (M*1 S)*X 

=  S*M‘1X 

=  S*T*(T*)'1M"1T'1TX 
=  {TS)*(TMT*)_1TX 
=  (TS)*Mt-1(TX) 

Recalling  that  MT  is  a  diagonal  matrix,  we  see  that  the  vector  (TS)*My1  may 
be  computed  once  and  stored,  and  then  its  inner  product  with  TX  computed. 

The  hardware  required  to  implement  this  approach  is  virtually  identical 
to  that  of  the  unit  vector  method.  After  the  w^'s  have  been  calculated 

*T.  C.  Chen,  "Unconventional  Superspeed  Computer  Systems,"  Spring  Joint 
Computer  Conference,  1971a,  pp.  365-366;  T.  C.  Chen,  "Parallelism,  Pipelining, 
and  Computer  Efficiency,"  Computer  Design,  Vol .  10,  1971b,  pp.  69-74. 
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(including  the  entries  of  M-^),  the  steering  vector  S  is  passed  through  the 

array,  and  the  values  £(T$)*My^  stored  in  a  vertically  connected  array  of 

processors  to  the  right  of  the  main  array,  as  in  Figure  10.  Then,  as  the  X 

vectors  are  pipelined  into  the  top  of  the  array,  their  results  (TX).  will  be 

available  at  PE.  where  C(TS)*M^].j  is  stored,  and  inner  products  of  the  vectors 

will  be  formed  just  as  in  Figure  10.  The  timing  diagram  is  similar  to  that 

in  Figure  7,  with  the  value  of  F  =  W*X  being  available  from  PE  n+1  time 

n 

steps  after  X  is  input. 


°3  -  °h  *  •  Qj 

Qj  -  [(TSJ-H^j 
°o  ■  0 

D"  ■  VTX)j 

=  (TS)*My1TX  =  OUTPUT 


Figure  10.  Computations  for  forming  the  filter  function  in 
transform  space. 


This  approach  has  two  advantages  over  approaches  described  in  the 

preceding  two  subsections.  First,  the  array  produces  the  desired  output 

directly  instead  of  just  the  weights  W.  The  hardware  is  as  simple  or  simpler 

than  the  unit  vector  method  implementation  and  much  simpler  than  the  reverse 

flow  method.  Second,  it  is  possible  to  continuously  update  the  w..'s  without 

^  j 

temporarily  halting  computation  of  the  weights;  in  this  way,  the  w..'s  are  always 
computed  from  the  most  recent  information.  This  point  will  be  discussed 
further  in  Secti.on  3. 
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The  disadvantage  of  the  transform  space  method  is  that  in  a  real 
system,  the  X's  may  be  sampled  at  a  far  faster  rate  than  the  array  is  able 
to  process  them.  The  X's  are  accepted  one  per  time  step  of  the  array  (and 
one  F  =  W*X  value  is  computed  once  per  time  step),  whereas  the  sampling  rate 
of  the  radar  may  be  many  times  faster.  (The  unit  vector  method  and  reverse 
flow  method  do  not  have  this  disadvantage  because,  given  W,  F  =  W*X  can 
be  calculated  extremely  fast.)  Whether  this  downsampling  would  adversely 
affect  the  overall  system  performance  depends  on  the  individual  system. 
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2.7  THE  SPECIAL  CASE  OF  A  SIDE-LOBE  CANCELLER 

It  is  possible  to  simplify  our  proposed  implementation  in  the  case  of 
a  side-lobe  canceller  (SLC),  because  of  the  special  form  of  the  steering  signal 
(all  zeroes,  except  for  a  one  on  the  component  corresponding  to  the  main  beam). 
It  turns  out  that  the  best  way  to  exploit  this  special  form  is  to  have  the  one 
in  the  steering  signal  occur  in  the  last  (n^)  component.  With  this  S,  TS  =  S 
(since  T  is  unit  lower  triangular);  thus,  no  work  is  required  to  pass  S  through 
the  array. 

The  effect  on  the  unit  vector  method  is  to  reduce  the  sum  defining  the 

output, 

(t*v)j  *  S  V*  ■  £  (7iI>kvk  '• 

to 

“j  ■  ' 

w  nn 

where  My  =  diag  (MTj j ) .  Thus,  only  one  outboard  processor  is  needed  at  output 

port  n  to  compute  VJj.  The  overall  timing  is  the  same. 

The  reverse  flow  method  also  simplifies  because  the  vector  to  be  passed 

backwards  through  the  array  is  known  a  priori,  (0,...,0,M-J  )^,  eliminating 

nn 

the  need  for  shift  registers  and  reducing  the  overall  length  of  the  pipe  by 
n  steps. 

The  transform  space  method  becomes  exceedingly  simple,  with  only  one 
outboard  processor  required  at  output  port  n,  and  since 

F  =  (TS)*My1(TX)  =  (My]TX)n 

its  output  is  simply  the  filter  function. 
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3,  SYSTEM  IMPLEMENTATION  CONSIDERATIONS 

This  section  describes  the  characteristics  of  different  versions  of  the 
basic  implementation  discussed  in  Section  2,  and  various  radar  engineering 
considerations  and  how  they  affect  the  choice  of  implementation.  We  consider 
different  ways  of  computing  the  internal  coefficients  (Section  3.1);  error 
sensitivity  and  fault  tolerance  (Section  3.2);  different  ways  to  perform  arith¬ 
metic  and  growth  bounds  for  intermediate  results  (Section  3.3);  implementation 
with  0(n)  processors  instead  of  0(n  )  (Section  3.4);  and  construction  of 
individual  processors  (Section  3.5).  In  Section  3.6  we  summarize  the  variety 
of  system  implementation  choices  discussed  in  Sections  3.1  through  3.5,  and 
show  how  different  radar  engineering  system  parameters  affect  implementation 
choice.  Finally,  in  Section  3.7,  we  select  a  typical  set  of  system  parameters 
and  give  a  detailed  design  for  a  Gram-Schmidt  processor  incorporating  those 
parameters. 

3.1  CALCULATION  OF  INNER  PRODUCTS 

The  algorithm  of  Section  2,1  requires  the  calculation  of  many  inner 

products  <Zj,zl>^.  The  method  given,  which  computes  the  sample  average 

£<Z^(k)z](k)  (see  Figure  5),  is  one  of  several  possible  methods.  All 
k  J  1 

the  methods  involve  averaging  over  samples,  as  in  Figure  5,  but  can  weight 
different  samples  differently: 

Em  =  t  •  (16) 

k=l  K  K 

Flere  E^  is  one  sample  value  (say,  zj  (k)Z^  (k) ),  w^  is  its  weight,  and  Em  is  the 
estimated  average  using  m  samples.  For  Em  to  be  an  unbiased  estimator  of 
the  true  average,  it  is  necessary  that  >  0  and  1.  Subject  to  this 
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mild  restriction,  we  are  free  to  choose  the  w.  's  so  that  our  estimate  F  has 

k 

desirable  statistical  properties,  such  as  being  able  to  closely  follow  the 
true  average  as  it  changes  in  time  without  being  confused  by  noise  spikes. 

It  is  well  known  that  if  the  underlying  distribution  from  which  the 
^k'S  dre  samP^ec*  ’s  stationary  in  time,  tnen  the  best  unbiased  estimator  of 
their  mean  is  an  equal  weighting  of  all  available  samples: 

E  m  =  lL  £k  •  07) 

k=l 

When  the  underlying  distribution  is  changing,  as  it  will  in  any  real  system, 
then  Eq.  (17)  introduces  a  bias,  because  it  weights  old  samples  as  much  as  new 
ones.  A  good  choice  of  wk's  must  satisfy  two  conditions:  It  must  average 
over  enough  samples  to  compute  a  statistically  significant  result  (at  least 
n  samples  (n^number  of  weights)  are  required  just  for  the  matrix  M  to 
be  nonsingular),  and  it  must  weight  the  recent  values  heavily  enough  to 
follow  the  average  quickly,  but  not  so  heavily  that  noise  spikes  confuse  it. 

The  optimal  choice  of  an  averaging  method  probably  would  vary  from 
system  to  system,  but  there  are  certain  schemes  which  work  well  in  many 
situations  and  are  worth  analyzing.  The  methods  have  different  implementa¬ 
tions,  depending  on  the  rest  of  the  system  (e.g.,  implementations  of  Sections 
2.6.2  and  2.6.1  versus  that  of  Section  2.6.3).  After  the  discussion  of  each 
method,  therefore,  we  analyze  various  implementations.  Finally,  we  present 
test  results  using  actual  radar  data.  These  results  allow  us  to  compare 
how  well  the  methods  maximize  the  signal-to-noi se  ratio  (SNR)  of  the  system. 


♦ 


46 


First,  we  have  block  averaging,  the  method  described  in  Section  2.  In 


this  method,  the  w^'s  are  periodically  reinitialized  to  zero,  data  are  passed 

through  the  array,  and  the  w..'s  are  formed,  giving  equal  weight  to  each 

J 

product  in  the  sum  (as  in  Eq.  (17)).  This  form  of  averaging  may  be  accom¬ 
plished  using  the  same  set  of  K  samples  to  comr  te  the  w^'s  in  each  row,  or 

different  sets.  If  the  same  set  of  samples  is  used,  buffering  is  required 
to  save  the  data.  Buffering  occurs  just  before  the  inputs  of  the  array, 
in  order  to  pass  the  data  through  the  array  n-1  times.  If  buffering  is  not 
used,  a  total  of  K(n-l)  samples  is  required,  and  the  method  is  called  cascaded 
block  averaging.  The  first  time  through  is  to  compute  the  first-row  averages 
(w^  and  w i j )  in  the  "if  k=i"  clause  in  Figure  5.  The  second  pass  performs 
the  outer  "else"  clause  of  Figure  5.  Zj(2)  =  Z 1 ( 2 )  -  w^jZj(2)  in  row  1  and  the 

"if  k=i"  clause  in  row  2.  In  general,  the  copy  of  the  data  input  to  the  array 

between  time  steps  (m- 1 ) K+l  and  mK  finally  arrives  at  row  m  and  is  used  to 

compute  w^.  between  time  steps  (2n-2)K+l  and(2m-l)K.  Steering  vectors  S  (in 
MW=S)  can  be  input  starting  at  time  mK+1 ,  and  all  the  values  of  TS  are  avail¬ 
able  at  time  (2n-3)K+l . 

If  no  buffer  is  to  be  used,  then  a  different  set  of  K  samples  is  input 
in  each  time  period  (m-l)K+l  to  mK.  This  method,  called  cascaded  block 
averaging,  will  compute  values  of  w^.  close  to  their  true  values  if  the 
underlying  distribution  determining  the  input  values  changes  slowly  and 
enough  samples  are  used.  Recall  the  result  of  Reed,  Mallett,  and  Brennan 
[19747  that  says  approximately  2n  samples  are  required  for  an  expected  SNR 
of  3  dB  within  optimum. 


I.  S.  Reed,  J.  D.  Mallett,  and  L.  E.  Brennan,  "Rapid  Convergence  Rate 
in  Adaptive  Arrays,"  IEEE  Trans,  on  Aerospace  and  Electronic  Systems, 
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Second,  we  have  exponential  averaging.  In  this  case,  we  take  a 


weighted  sum  of  the  old  average  and  the  new  data: 

Em+1  =  SEm+1  +  (l-S)Em  0  <  S  <  1  .  (18) 

If  S  is  close  to  1,  then  the  new  average  Em+^  is  very  responsive  to  new  data 
Em+i .  If  S  is  close  to  zero,  Em+^  changes  more  slowly,  being  determined  mainly 
by  the  old  average.  Exponential  averaging  can  be  implemented  either  with  or 
without  a  buffer,  as  in  block  averaging,  or,  once  the  system  has  been  started 
up  and  w. .  values  are  available,  the  w..'s  can  be  updated  by  inputting  just 

'0  *  vJ 

a  few  sample  voltage  vectors  and  averaging  them  in.  This  is  an  advantage 
over  block  averaging:  With  block  averaging,  each  time  the  w^.'s  are  to  be 
updated,  a  lag  time  of  (2n-3)K  is  needed,  where  K  is  at  least  n  and 
preferably  2n;  K  may  be  significantly  smaller  with  exponential  averaging. 

Note  that  it  is  possible  either  to  exponentially  smooth  w. .  itself,  or 

*  J 

to  smooth  its  numerator  and  denominator  separately,  before  dividing  them. 

(If,  in  addition  to  this  separate  smoothing,  the  denominator  is  approximated 
by  its  closest  power  of  2,  the  problem  of  doing  division  is  reduced  to 

shifting). 

Window  Averaging 

Finally,  we  have  window  averaging.  In  this  case,  the  last  K  samples 
are  always  used  to  compute  the  w.  .'s,  thus  using  the  most  recently  available 

*  J 

information  at  all  times.  This  method  requires  saving  a  buffer  of  the  last 

K  values  used.  When  a  new  value  (e.g.,  zNz1.)  is  computed,  it  is  added  to 

*  J 

the  window  average,  put  on  top  of  the  buffer,  and  the  oldest  value  removed 
from  the  bottom  of  the  buffer  and  subtracted  from  the  average.  The  buffer 
can  conveniently  be  implemented  as  a  shift  register.  In  addition  to  always 
using  the  most  recent  data,  window  averaging  has  the  same  advantage  as  ex- 


ponential  over  block  averaging:  It  does  not  always  require  a  lag  of  (2n-3)K 
samples  to  update  the  w.c.'s.  The  disadvantage,  of  course,  is  the  need  for  a 
long  shift  register  (at  least  n  and  probably  2n  words)  for  each  average 
(n(n-l)/2  or  n(n-l),  depending  on  the  implementation). 

The  transform  space  method  of  Section  2  can  be  used  as  well  as  any  of 
the  above  methods,  but  its  real  advantage  lies  in  the  fact  that  since  samples 
(in  contrast  to  steering  vectors)  are  constantly  being  passed  through  the 
array,  the  w.  .'s  can  be  updated  simultaneously  with  the  calculation  of  filter' 
functions.  Hence,  when  exponential  or  window  averaging  is  used,  no  delay  at 
all  (after  startup)  is  needed  to  update  the  weights.  Each  filter  function 
is  always  a  function  of  the  most  recent  information. 

Reed,  Mallett,  and  Brennan  [1974]  have  shown  that  the  expected  value 

of  the  achieved  SNR  is  10  log-jg[(K+2-n)/(K+l )]  dB  below  the  SNR  achieved  with 

optimal  weights.  This  expected  loss  is  about  3  dB  when  the  number  of  samples, 

K, equals  2n.  Of  course,  if  a  great  deal  of  clutter  is  present,  more  samples 

★ 

may  be  required  to  average  it  out.  Also,  Brennan  [1974]  has  recently  shown 
that  if  the  weights  determined  by  K  samples  are  in  turn  used  to  form  filter 
functions  from  those  came  samples  instead  of  a  different  set  of  samples  (as 
in  nontransform  space  operation),  then  the  expected  SNR  is  actually  higher 
(see  Appendix  A) . 

We  now  present  some  qraphs  of  system  performance  versus  sample  size  for 
both  real  and  simulated  data,  varying  numbers  of  weights,  and  varying  aver¬ 
aging  schemes  (Figures  11  through  21).  System  performance  is  measured  by 
how  many  decibels  down  the  achieved  SNR  is  from  the  optimal  SNR: 

★ 

L.E.  Brennan,  "Performance  of  the  Sample  Covariance  Matrix  Algorithm 
for  4daptive  Arrays,"  unpublished  manuscript,  19  July  1979. 


Performance  =  10  ^ °9 1 q[ ( I S *W | ^/W*MW)/S*M  ^S)] 

The  real  data  is  taken  from  an  operational  5-weight  adaptive  system,  and  the 
simulated  data  model  is  described  in  Appendix  H.  The  true  matrix  M  for  the 
real  data  is  estimated  by  averaging  over  all  available  samples  (100).  The 
different  algorithms  used  are  Cholesky,  Gram-Schmidt  using  blocked  averaging, 

Gram-Schmidt  using  exponential  averaging  (for  various  exponential  weights, 
i.e.,  the  factor  S  in  Eq.  (18)),  and  Gram-Schm:dt  using  cascaded  block 
averaging.  In  the  case  of  simulated  data,  the  various  model  parameters  are 
the  number  of  weights,  number  of  jammers,  location  and  power  of  the  jammers, 
and  the  ratio  of  strongest  jammer  power  to  receiver  noise  power  (per  receiver). 

This  last  quantity  is  the  spectral  condition  number  of  the  matrix.  The  use 
of  tapped  delay  lines  (one-sample  long)  is  also  indicated. 

For  the  real  data  (Figures  11  and  13)  we  see  that  the  results  for 
Cholesky  and  blocked  G-S  almost  overlap.  They  are  mathematically  identical 
algorithms,  and  any  small  difference  is  attributable  to  roundoff  error  (all 
arithmetic  is  32-bit  floating  point  with  24-bit  mantissas).  Cascaded  blocked 
G-S  has  essentially  the  same  performance  as  blocked  G-S;  but  exponential  G-S 
lags  behind,  performance  improving,  but  still  poor  as  the  weight  decreases 
(which  means  the  new  samples  are  weighted  less  than  the  old). 

For  the  simulated  data  with  5  weights  (Figures  14  through  17), 

Cholesky,  blocked  G-S, ana  cascaded  blocked  G-S  all  have  virtually  identical 
performance,  but  exponential  becomes  poorer  as  the  number  of  jammers  increases 
(from  1  to  4). 

With  10-channel  simulated  data  using  5  jammers  (Figure  18),  cascaded 
G-S,  blocked  G-S, and  Cholesky  are  again  virtually  identical.  In  fact, 
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Figure  14.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  1 

4.  Location  and  power  of  jammer:  (-36.9°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  50  dB. 
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Figure  16.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

#2  (-11 .5°,  3.33  dB) 

#3  (+11.5°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB. 
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-  -  CHOLESKY 

“  -  exponential  gram-schmidt,  weight  -  .25 

*  -  BLOCKED  GRAM-SCHMIDT 
O  -  CASCADED  BLOCKED  GRAM-SCHMIDT 
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Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  4 

4.  Location  and  power  of  jammers  : 

#1  (-36.9°,  7.5  dB) 

n  (-11.5°,  5.0  dB) 

# 3  (+11.5°,  2.5  dB) 

H  (+36.9°,  0  dB) 

5.  Ratio  of  power  of  stronqest  jammer  to  receiver 
noise  power  =  50  dB. 
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Figure  18.  Simulated  data: 


1.  Number  of  weights  =  10 

2.  No  tapped  delays 

3.  Number  of  jammers  =  5 


Location  and  power  of  jammers: 
#1  (-53.1°,  8.0  dB) 

#2  (-36.9°,  6.0  dB) 

#3  (-23.6°,  4.0  dB) 

#4  (-11.5°,  2.0  dB) 


ir-T  \  l  I  .  J  )  C.  »  U  UL 

#5  {  0.0°,  0  dB) 


Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dR. 


1 


cascaded  G-S  performs  somewhat  better,  because  it  uses  n-1  times  as  much  data 
as  the  other  methods.  Again,  exponential  performs  the  poorest,  with  a  weight 
of  0.25  being  best  in  this  case  (Figure  19). 

With  35-  and  50-channel  simulated  data,  using  10  and  20  jammers, 
respectively,  blocked  and  cascaded  G-S  and  Cholesky  are  again  virtually 
identical,  with  exponential  G-S  performing  roorly  (Figures  20  and  21, 
respectively). 

Seven  of  the  nine  cases  achieve  an  SNR  within  3  dB  of  optimal 
after  2n  samples;  the  other  two  cases  are  within  5  dB. 

The  location  of  the  jammers  (first  number  in  parentheses  in  item  4 
of  the  simulated-data  captions)  is  given  in  degrees  from  boresight,  and, 
jammer  power  (second  number  in  parentheses)  is  expressed  as  decibels  below 
the  strongest  jammer. 

We  spent  a  great- deal  of  effort  analyzing  cases  with  low  numbers  of 
weights  because  the  effects  of  changes  in  the  number  of  jammers  and  exponen¬ 
tial  weights  show  up  more  quickly. 
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Figure  19.  Simulated  data: 

1.  Number. of  weights  =  10 

2.  No  tapped  delays 

3.  Number  of  jammers  =  5 

4.  Location  and  power  of  jammers: 

#1  (-53. 1°,  8.0  dB) 

#2  (-36.9°,  6.0  dB) 

#3  (-23.6°,  4.0  dB) 

#4  (-11.5°,  2.0  dB) 

#5  (0.0°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB. 
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-  -  CHOLESKY 

.  -  EXPONENTIAL  GRAM-SCHMIDT,  WEIGHT  =  .25 

-  BLOC  KED  GRA'T-SCHMIDT 

-  CA5CADED  BLOCKED  GRAM-SCHMIDT 


Figure  20.  Simulated  data: 

1.  Number  of  weights  -  35 

2.  No  tapped  delays 

3.  Number  of  jammers  =  10 

4.  Location  and  power  of  jammers: 

*1  (-70.5°,  9.0  dP)  *6  (*41.1°,  4.0  dP) 

n  (-62.3",  8.0  dB)  ¥ 7  (-36.9%  3.0  dP) 

*3  (-55.95°,  7.0  dB)  « 8  (-32.9°,  2.0  dB) 
H  (-50.5°,  6.0  dB)  *9  (-29.1°,  1.0  dB) 

#5  (-45.6°,  5.0  dB)  *10(-25.4°,  0  dB) 

5.  Ratio  of  power  of  stronnest  jammer  to  receiver 
noise  power  =  27  dB. 
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Figure  21.  Simulated  data: 

1.  Number  of  weights  =  50 

2.  No  tapped  delays 

3.  Number  of  jammers  =  20 
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5.  Ratio  of  power  of  strongest  jammer  to  receiver  noise 
power  =  20  dB. 
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3.2  FAULT  TOLERANCE  AND  NOISE  SENSITIVITY 

In  this  section  we  analyze  the  effect  of  the  system's  different 
failure  modes  on  performance,  as  measured  by  the  SNR. 

External  Failure  Mode 

The  first  category  of  failure  mode  is  external,  which  means  that 
although  the  processor  array  is  functioning  normally,  something  is  wrong  with 
the  antenna/receiver  subsystem  supplying  inputs.  The  possible  failures  we 
examine  are  1)  input  j  is  zero  (others  normal),  2)  input  j  is  a  nonzero 
constant  (others  normal),  and  3)  Gaussian  noise  of  Dower  p  is  added  to  input 
j  (others  normal ) . 

When  the  input  j  is  zero,  the  system  continues  to  function,  computing 


the  correct  weights  for  an  n-1  weight  system  and  a  zero  weight  for  input  j. 

To  verify  this  phenomenon,  note  that  all  the  w^.  values,  1  <  i  j  - 1  ,  are  zero 
because  (in  the  notation  of  Figure  5)  w^  =  wGj  +  Z.(k)  ■  2 1  ( k )  and  zl  is 
zero,  so  w—  is  zero,  is  zero,  w^  is  zero,  and  so  on.  Thus,  Zx  =  0, 
and  (with  the  odd  convention  that  1/0=0)  we  have  Wj..=  0,  3  +  'l  5,  i  5.  n>  since  wi 
is  Zj  =  Wj •  (1/0)  =  0.  Thus  the  values  computed  in  columns  j  +  1  to 

n  of  the  array  have  nothing  subtracted  from  them  in  row  j  of  the  array:  They 


are  independent  of  input  j.  Since  the  values  computed  in  columns  1  to  ,j-1 


are  obviously  independent  of  column  j,  all  the  other  computed  values  are 


computed  correctly  as  though  input  j  were  missing.  Since  1/w^  is  taken  to 
be  zero,  and  the  output  at  port  j  on  the  right  of  the  array  is  multiplied  by 


1/w. we  see  the  system  operates  as  claimed  for  the  transform  space  and  unit 
vector  methods,  because  they  get  a  zero  contribution  from  output  j  and  only 


operate  the  array  in  the  forward  direction  discussed  above  in  this  paragraph. 
To  see  that  reverse  flow  also  operates  normally,  note  that  the  zero  input  at 
port  j  on  the  right  remains  zero  as  it  moves  left,  across  the  array  (since  the 


w..'s  in  row  j  are  zero)  and  when  it  is  broadcast  vertically  by  P.  .  ,,  it  has 
J i  J  1 

no  effect  on  the  values  computed  by  rows  1  to  j-1.  Since  rows  j+1  to  n  are 
similarly  unaffected,  we  see  all  weights  except  j  are  computed  normally,  and 
weight  j  is  zero  as  desired. 

The  optimal  SNR  of  the  n-1  weight  system  can  be  expressed  in  terms  of 

XL 

the  old  optimal  SNR  with  n  weights,  the  j  optimal  weight  w.  (when  all  n 
are  available)  and  mJJ  =  jth  diagonal  element  of  the  inverse  of  the  true  n  x  n 
covariance  matrix: 


SNRNEW0PT  "  SNR0LD0PT  jj 

m 


(19) 


This  result  will  be  derived  later. 

When  input  j  is  a  nonzero  constant,  a  similar  analysis  shows  incorrect 
weights  are  computed.  Approximately  0  values  for  w. .  in  column  j  are  computed, 

*  J 

since  z]  is  Gaussian  with  zero  mean.  Thus  is  constant  and  nonzero,  and 
1/w,,  is  nonzero.  The  values  of  w . ,  in  row  j  are  again  approximately  zero, 

J  J  J  1 

but  output  j,  even  after  multiplication  by  1/w .., remains  approximately  con- 

J  J 

stant  and  nonzaro.  Thus  incorrect  values  are  computed  by  all  three  methods. 

To  examine  the  case  of  Gaussian  noise,  we  need  some  general 
results  on  the  effects  of  perturbations  in  the  weights  on  the  SNR.  These 
results  are  derived  in  Appendix  B  and  presented  below. 

The  SNR  is  given  by: 

SNRW  =  |S*W|2/W*MW  ,  (20) 

where  SNR^  is  the  SNR  achieved  using  W  as  weights.  When  using  the  optimal 
weights,  Wgpj  =  M  ^S,  we  have 

SNRort  =  S*W  =  W*MW  =  S*M-1S  •  (21) 
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Suppose  that  W  =  W^p^  +  E,  where  E  is  an  error  vector.  Then  we  may  write 
SNR^  in  two  equivalent  forms: 

SNRW  “  ('  '  Ww)’SNR0PT  +  ™  '  SNRE 

“  SNR0PT  ■  E*(m  •  STO^)E  ■  W  •  <*> 

The  first  expression  for  SNR,^  shows  it  is  a  weighted  linear  combination  of 
SNRqpi-  and  SNR^,  and  the  second  expression  shows  how  the  degradation  in  SNR 
depends  on  the  eigenvalues  and  eigenvectors  of  the  positive  semidefinite 
Hermitian  matrix  (M  -  SS*/SNRopT).  In  particular,  a  lower  bound  for  SNRW  is 

SNRW  >  SNR0pT  -  Xmax(M)  •  I|E||2  +  0(| JE| |3)  ,  (23) 

which  shows  that  the  SNRW  depends  quadratical ly  on  E,  and  is  worse  if  the 
largest  eigenvalue  of  M,  X„„(M),  is  large.  The  bound  on  Eq.  (23)  can  be 
attained  if  S  is  orthogonal  to  the  eiaenvector  of  fl  belonqinq  to  A  (M), 
and  if  E  is  proportional  to  it.  Physically,  since  eigenvectors  of  the 
covariance  matrix  M  are  the  expected  signal  of  noise  sources  and  the  eigen¬ 
value  is  the  power  of  the  noise  source,  the  worst  deoradation  in  SNR  shoud 
occur  when  the  steering  signal  S  is  aininr  the  array  in  a  direction  ortho¬ 
gonal  to  the  strongest  noise  source  but  E  is  aiming  exactly  in  the  direction 
of  che  strongest  noise  source. 

We  may  also  examine  the  effects  of  nerturbati ons  on  arbitrary 
(nonoptimal )  weights  W.  Suppose  that  instead  of  W  weights,  W=W+E  are 
used.  Then  the  new  SNR  using  W  is 


SNR*  =  SNRW 


+  0(||E 


(24) 
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2 

where  the  quadratic  and  higher-order  terms  are  indicated  by  0(||E||  ). 

Now  the  dependence  on  E  is  linear,  with  the  effect  on  SNR^  being  largest 
when  E  is  proportional  to  (SS*-SNR^ *M)W  (the  SNR  can  increase  or  decrease), 
and  to  a  linear  approximation  lies  in  the  range 


When  E  is  orthogonal  to  (M  -  SS*/SNRwjW,  the  dependence  of  SNR^  on  E  is 
quadratic.  Thus, 


SNRW  =  linear  terms  in  Eq.  (20) 

+  ^  f(,pc(U*MEy  F*\/M  SS*  \  ' 

iw  Re  rKe  ww w  ■ E  AM '  swyE 

+  higher-order  terms  in  ||E||  .  (26) 


When  W  =  Wqpj,  Eq.  (26)  simplifies  to  Eq.  (23). 

Now  let  us  return  to  the  specific  problems  of  Gaussian  noise  of  power 
p  being  added  to  input  j.  (The  following  results  are  also  derived  in 
Appendix  8).  Adding  this  noise  is  equivalent  to  addino  p  to  the  „ith 


A 

diagonal  elements  of  the  true  covariance  matrix  M  to  get  the  new  matrix  M, 


M  =  M+A,  where  A^  =  {p  if  k=£=j,0  otherwise}. 

of  m"1  r 


66 


-1  ii  *  ^-1 

where  M  =  {m. .}  and  M  =  {m  Thus  the  new  weights  W  =  M  S  can  be 

*  J 

expressed  in  terms  of  the  old  weights 


1  /  -p  \ 

0  ' 

\l+Pmjj' 

wj 

0J 

•  th 

+-J  row 


(28) 


Hence  the  new  weights  are  the  sum  of  the  old  weights  and  the  jth  column  of 
M  "*  times  -pw./(l  +  pmJ^).  In  particular,  the  new  weight 

J 

^  i  i 

w.  =  w./n  +  pmJJ) 

J  J 


So  as  p  increases,  w.  decreases,  approximately  in  proportion  to  1/p  for  p 

J 

A 

large.  As  p  approaches  infinity,  w.  approaches  zero  and  the  system  operates 
as  though  there  were  only  n-1  weights,  because  all  the  information  in 
channel  j  is  being  obscured  By  the  large  noise. 

We  must  now  distinguish  between  two  subcases,  depending  on  when  the 
noise  is  added  to  the  system.  Case  1  occurs  when  the  noise  is  added  by  the 
antenna/receiver  subsystem,  so  that  the  sample's  voltage  vectors  used  to 
compute  the  filter  functions  also  contain  noise,  and  case  2  occurs  when  the 
noise  is  added  later,  thus  affecting  the  weights  only  and  not  the  sample 
voltages  for  the  filter  function.  In  the  first  case,  where  the  noise  is 
present  from  the  beginning,  it  is  as  though  the  true  covariance  matrix  of  the 
noise  process  changes  (by  adding  p  to  the  diagonal);  in  the  second  case,  the 
true  covariance  matrix  remains  the  same,  but  the  weights  change. 

In  the  first  case,  we  can  compute  the  optimal  SNR  of  the  new  system 
with  noise  SNRNEW0PT=S*M' *S  using  Eg.  (23): 

SNRNEW0PT  =  SNR0LDCiPT  +  •  (29) 
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Thus,  the  optimal  SNR  achievable  with  noise  p  added  varies  as  shown  in 


Figure  22.  For  p  small  it  decreases  approximately  linearly  with  a  slope 
of  -lwj|  •  If  |wj|  is  large,  then  system  performance  depends  heavily 
on  the  jth  channel;  hence,  adding  noise  to  that  channel  is  particularly 
bad.  Also,  if  is  small,  then  the  lower  bound  for  SNR^Qpy, 

SNR0LD0PT  "  (lwj  is  smaII- 


Figure  22.  Effect  of  added  noise  on  optimal  SNR. 

In  fact,  if  S  =  ej  =  {0, . . . ,0,1 ,0, . . . ,0}  (1  in  jth  place), then 
SNRqpt  “  m'"  ®nd  w j  =  m '  .  Then  ^^£yQpy  =  j [p  I/SNRqlqqpj)  »  which 
approaches  0  as  p  approaches  infinity. 

It  is  possible  to  make  a  physical  interpretation  of  mJJ  being 
small.  Mathematically,  it  means  the  covariance  matrix  Mp=M  has  an  eigen¬ 
vector  approximately  equal  to  e^  =  (0, . . . ,0,1 ,0,. . . ,0)  (1  in  jth  entry)  whose 
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corresponding  eigenvalue  is  large,  which  the  matrix  Mn  -j  =  M  with  row  and 
column  removed  does  not  have.  Physically,  this  means  the  n  weight  system 
with  matrix  is  capable  of  adapting  to  a  strong  noise  source,  with  steering 
signal  e . ,  whereas  the  n-1  weight  system  with  matrix  Mn  ^  cannot.  Since 


n-1 

n 

j=i 


(a(x^  =  eigenvalue  of  matrix  x),  m33  will  be  small  if  this  relationship 

of  eigenvalues  and  vectors  of  M  and  M  ,  holds.  In  other  words,  m33  is 

n  n- 1 

small  if  the  jth  channel  is  important  for  the  system  to  be  able  to  adapt 
to  some  large  noise  source.  The  system  will  then  be  especially  sensitive 
to  noise  in  that  channel. 

In  the  second  case,  where  noise  is  added  late  so  that  it  affects 
the  weight  computations  but  not  the  true  covariance  matrix  M,  we  may  use 
Eqs.  (22),  (24),  and  (25)  to  analyze  system  performance.  Assuming  that 
the  weights  used  after  perturbations  are  the  optimal  weights  for  the  noise 

**  /si 

field  seen  by  the  array,  i.e.,  W  =  M  5,  we  may  use  Eqs.  (21)  and  (26)  to 
compute  the  new  SNR^: 


SNRQ  -  SNR0pT  -  (p2|Wj|2  mjj^SNR0pT(l  +  pmJJ) 

)]  •  [snropt 


-  P2 1  w  .  I  2  ( 2  +  pm33  ' 
J 


iw  jVj] 


(30) 


2  ^ 

Note  that  |w  .  |  /m^.  =  SNRp,  where  E  =  W  -  WQpy.  SNR^  is  a  decreasing 
function  of  p,  with  a  lower  bound  of 

SNRA  >  SNRopT  -(|Wj|2/mjj)  ,  <31> 
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which  is  the  same  lower  bound  as  for  SNR^^^pj,  given  in  Eq.  (25). 

Thus,  if  an  infinite  amount  of  noise  is  added  to  channel  j,  effectively 
turning  the  system  into  an  n-1  weight  system,  the  degradation  in  SNR  can 
be  bounded  independent  of  where  in  the  system  the  noise  is  added.  In  both 
cases,  a  large  j^  weight  |w  .|  or  a  small  mJJ  indicates  the  system  is 
particularly  sensitive  to  noise  in  that  channel.  This  analysis  also  proves 
Eq.  (16). 

We  now  present  some  simulation  results  of  adding  noise  to  a  5-weight 
system  with  3  jammers.  Figure  23  is  a  control  case  with  no  noise  added. 

The  total  noise  power  (summed  over  all  inputs  to  which  noise  is  added)  is 
4  dB  above  the  strongest  jammer.  These  results  correspond  to  case  2  of 
the  above  discussion,  where  the  noise  affects  the  weight  calculation  only. 

Apparently,  inputs  2  and  3  (Figures  25  and  26)  are  very  insensitive 
to  noise-,  inputs  1  and  4  (Figures  24  and  27)  are  somewhat  sensitive;  and 
input  5  (Figure  28)  is  very  sensitive.  This  sensitivity  was  determined  by 
looking  at  the  steady-state  cancellation  for  either  G-S  or  Cholesky  (i.e., 
performance  degradation  for  20  samples).  Since  the  steering  signal  for  these 
latter  simulations  is  (0,0,0,0,1),  i.e.,  an  SLC  with  channel  5  as  the  main, 
channel  5  should,  by  far,  be  the  most  sensitive.  The  other  inputs  are  not 
as  sensitive.  Since  there  are  only  3  jammers,  as  long  as  only  one  input 
is  affected  enough  degrees  of  freedom  remain  to  effectively  cancel  the 
jammers  (unless,  of  course,  channel  5  is  affected).  When  3,  4,  or  5  channels 
are  affected  (as  in  the  last  three  plots,  Figures  29  through  31),  so  that 
there  are  not  enough  degrees  of  freedom  to  cancel  the  jammers  and  the  extra 
noise,  performance  degrades  markedly. 
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Figure  23.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

§ 2  (-11. 5°-,  3.33  dB) 

#3  (+11.5°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

6.  No  noise  added. 
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CHOLESKY 

EXPONENTIAL  GRAM-SCHMIDT ,  WEIGHT 
BLOCKED  GRAM-SCHMIDT 
CASCADED  BLOCKED  GRAM-SCHMIDT 


NUMBER  C"  SRMPl.ES 

Figure  24.  Simulated  data: 

1. 

Number  of  weights  =  5 

2. 

No  tapped  delays 

3. 

Number  of  jammers  =  3 

4. 

Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

#2  (-11.5°,  3.33  dB) 

#3  (+11.5°,  0  dB) 

5. 

Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

6. 

Noise  added  to  channel  1  of  total  power 

4  dB  above  strongest  jammer. 

-  -  CHOLESKY 

-  -  EXPONENTIAL  GRAM-SCHMIDT,  WEIGHT  =  .EE 

.  -  BCOCKEC  GRAM-SCHMIDT 

v  -  CASCADED  BLOCKED  GRAM-SCHMIDT 


Figure  25.  Simulated  data: 


1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

n  (-11.5°,  3.33  dB) 

#3  (+11.5°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB. 

6.  Noise  added  to  channel  2  of  total  power 
4  dB  above  the  strongest  jammer. 
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-  -  CHOLESKY 

_  -  EXPONENTIAL  SRAM-SCHMIDT,  HEIGHT  =  .25 
>  -  BLOCKED  GSAM-SCHMIOT 
O  -  CASCADED  BLOCKED  GRAM-SCWIDT 
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Figure  26.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

#2  (-11.5°,  3.33  dB) 

#3  (+11.5°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

6.  Noise  added  to  channel  3  of  total  owoer 
4  dB  above  strongest  jammer. 
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Z  CHOLESKY 

-  '  EXPONENTIAL  GRAM- SLUM! 

*■  BLOCKED  JVW-ECHMIOT 
<0  -  CASCADFD  3LXKFD  0R«.-1- 


Figure  27.  Simulated  data: 


Number  of  weights  =  5 
No  tapped  delays 
Number  of  jammers  =  3 
Location  and  power  of  jammers: 
#1  (-36.9°,  6.67  dB) 

#2  (-11. 5°,  3.33  dB) 

iS  >.ii  ro’  r\  jn\ 


#3  (+11.'5°;  O’  dB) 

Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

Noise  added  to  channel  4  of  total  power 
4  dB  above  strongest  jammer. 
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r  CHOLESM 

EXPOTJCNTIAL  3RAM- SCHMIDT,  WEIGHT  =  ..25 
X  -  BLOCKED  flUM-SCHHIDT 
■T  ■  CASCADED  BLOCKED  S3AM-SCHMIDT 


Figure  28.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36,9°,  6,67  dB) 

#2  (-11.5°,  3.33  dB) 

#3  (+11.5°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

6.  Noise  added  to  channel  5  of  total  power 
4  dB  above  strongest  jammer. 
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CHOLESK':  . _ 

EXPOln-fi  I  i’'L  "  -  - 1 1  ■  -  1  ■ 
BLOCKED  SS.V'i-SCrXjT 
CASCADED  ElCCKEC  SAA.<- . 


Figure  29.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9%  6.67  dB) 

#2  (-11.5%  3.33  dB) 

*3  (  +  11.5%  0  dB) 

5.  Ratio  of  power  of  stronoest  jammer  to  receiver 
noise  power  =  40  dB 

6.  Noise  added  to  channels  1,  ?,  3  of  total  power 
4  dB  above  strongest  jammer. 
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-•  ■  CHUtSKV 

-  ■  EXPONENTIAL  SRAM-SCR'lIiT,  WEIGHT  =  .25 
*•  ■  BtOCKEG  GXAM-SCHMIDT 


:igure  30.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

#2  (-11.5°,  3.33  dB) 

#3  (+11.5%  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

6.  Noise  added  to  channels  1,  2,  3,  and  4  of  total 
power  4  dB  above  strongest  jammer. 
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CHOLESKY 

EXPONENTIAL  GIVM-SCHMIDT,  HEIGHT  =  .25 
SL&CtCEJ  jiW-SCHf'ICT 

_  .  _ CA^CAdFJ  ai  y.KFi;  y.winT  _ _ 


Figure  31.  Simulated  data: 

1.  Number  of  weights  =  5 

2.  No  tapped  delays 

3.  Number  of  jammers  =  3 

4.  Location  and  power  of  jammers: 

#1  (-36.9°,  6.67  dB) 

#2  (-11.5°,  3.33  dB) 

#3  (+11. 5°,  0  dB) 

5.  Ratio  of  power  of  strongest  jammer  to  receiver 
noise  power  =  40  dB 

6.  Noise  added  to  channels  1,  2,  3,  4,  and  5  of 
total  power  4  dB  above  strongest  jammer. 


79 


Internal  Failure  Mode 


In  the  inte'ial  failure  mode,  processor  P..  outputs  only  constants. 

1  J 

This  case  is  similar  to  the  external  failure  mode  where  input  1  is  constant. 
Consider  first  the  case  where  the  output  of  is  identically  zero.  Outputs 
1  to  j-1  remain  unaffected.  As  in  the  external  case,  all  the  w^.'s  and 
outputs  of  P^j  for  k>i  are  zero;  so,  w^.j  is  zero,  and  the  rest  of  the  array 
(rows  j  to  n-1)  operate  as  though  the  zero  had  been  input  at  the  very  top  of 
column  j,  and  we  have  normal  n-1  weight  operation.  The  reverse  flew  situtation 


is  similar,  so  that  if  P..  fails  by  outputting  only  zero,  it  is  equivalent  to 

•  J 

the  external  failure  situation,  where  the  array  operates  correctly  as  an  n-1 


weight  system.  When  P„  puts  out  a  nonzero  constant,  incorrect  weights  are 
calculated,  as  with  the  external  failure  mode. 
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3.3  GROWTH  OF  INTERMEDIATE  RESULTS 


In  this  section,  we  derive  probabilistic  bounds  on  the  growth  of 
intermediate  results  during  the  computations. 

We  first  consider  the  algorithm  without  square  roots  and  using  block 
averaging.  We  further  subdivide  the  case  into  the  situation  where  1  )  the  actual 
covariance  matrix  M  is  known  (Figure  4a)  and  2)  where  only  samples  are  known 
(Figure  5a).  When  M  is  known,  we  have  the  following  bounds  on  the  zl(k)'s: 

J 

X  •  <  Z?(k)  <  m  k>i ;  these  bounds  are  sharo  (32a )* 

min  1/  x  v 


| Z j ( k ) | <  2  •  mmax;  sharp  within  a  factor  of  4  (32b! 

k 

| Z  ( k )  |  £  t^ax;  sharp  within  a  factor  of  2  (32c) 

/  mmav 

I  w . . !  <  — -  •,  sharp  within  a  factor  of  2  ,  (32d) 

1J  min 

where 

mmax  =  m^lmij  I  (M  s^ij»  • 

Amin  =  smallest  eigenvalue  of  M,  and 
A„,a„  =  laraest  eigenvalue  of  M  . 

1'idX 


The  value  mmax  can  be  bounded  simply  by  using  the  number  of  bits  and 

normalization  used  for  each  sample, and  the  number  of  samples  used.  If  the  largest 

2 

sample  value  is  X  (in  absolute  value),  and  S  samples  are  used,  iti  v  <  SX  .  Note 

that  the  sample  matrix  is  S  times  as  larqe  as  the  true  matrix  (S  =  number  of 

samples);  because  we  do  not  divide  the  inner  products,  we  calculate  by  S.  The 

m  used  here  is  the  maximum  possible  entry  of  the  sample  matrix  and  is  S 
max 

times  as  large  as  the  maximum  possible  entry  of  the  actual  matrix. 
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Figure  32  plots  the  number  of  bits  required  to  represent  £3  |Xj  where 

r=l 

I X |  £127  (8-bit  2's  complement  representation).  This  plot  puts  an  upper 
bound  on  the  number  of  bits  required  to  represent  inner  products.  The  simplest 
a  priori  bound  on  Amax  is  n*mmax;  once  the  matrix  is  computed,  the  norms  ||M||,, 
11^1  loo*  anc*  (M)  (trace  M)  (see  Isaacson  and  Keller  [1966]  )also  provide 
upper  bounds  to  Amax-  A  lower  bound  for  is  provided  by  the  level  of 

receiver  noise.  Typically,  ADC's  (analog-to-dinital  converters,  used  to  digi¬ 
tize  the  voltage  inputs)  are  adjusted  so  that  the  receiver  noise  causes  the 

least  significant  bit  of  the  converted  signal  to  be  random;  the  receiver 

_2t  2 

noise  is  therefore  close  to  S2  X  ,  where  t  is  the  number  of  bits  used  in 

the  ADC  (this  approximation  may  be  off  by  a  system-dependent  factor  close  to 

1).  Thus  |Wij|  <\fx^/S2'2V  =  21. 

When  using  samples  to  compute  the  w..'s,  we  use  the  statistical  inter- 

* 

pretation  of  the  receiver  inputs  being  complex  Gaussian  random  variables 
with  zero  mean  and  covariance  matrix  M.  The  intermediate  quantities  Z'i.(k) 

J 

are  then  also  Gaussian  random  variables  with  zero  means  and  computable 


variances.  The  quantity  w^.  has  a  complicated  distribution,  but  for  many 

samples  we  can  approximate  it  by  a  Gaussian  distribution,  and  w^  is  similarly 

2  •  ■ 
very  close  to  a  X  variable.  The  quotient  of  the  two  quantities. 


w. .  =  w. ./w. . , 
ij  iJ  iJ 


also  has  a  complicated  distribution,  and  for  this  analysis  we 


replace  it  by  its  expected  value.  The  results  are  as  follows: 


Z'.i(k)  is  a  Gaussian  random  variable  with  zero  means  and  variance 
J 

var[Z'i(k)]  =  zl(j),  where  z](j)  is  computed  as  though  the  actual 

J  J  - ' 


★ 

E.  Isaacson  and  H.  B.  Keller,  Analysis  of  Numerical  Methods, 
New  York,  John  Wiley  &  Sons,  Inc.,  1966. 
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sample  covariance  matrix  were  being  passed  through.  Thus 

X  .  <  var[z'  '(k)]  <  rnmav,  using  the  results  of  Eq.  (32). 

min  —  J  —  roax 

2.  ft  j  is  approximately  Gaussian  with  mean  E(w,y)  =  zl ( i )  (again 
as  though  the  actual  matrix  were  being  passed); 

2 

so  |E(wij.)|  <  mmax,  and  variance  vartw^)  <_  2*nViax/S  =  number 
of  samples). 

3.  w1j  is  approximately  x  with  S  degrees  of  freedom,  a  mean  of 
E(ft.j j )  =  zl  (i ) ;  so  |  E(wi j )  |  £  mmax  and  variance 

var(*Tj)  1  2‘%x/S’ 

Thus,  it  is  a  simple  matter  to  calculate  the  probability  of  overflow 
of  the  Z  j ( k ) 1 s  as  a  function  of  the  number  of  bits,  b,  used  to  represent  the 
Zj(k)‘s  in  fixed  point  (where  N(0,1)  is  a  standard  normal  random  variable): 

Probability  of  overflow  £  P^|N(0,1)|  £2b/vm^x| 

If  we  let  C  =  b  -  log-v^rT”  =  number  of  extra  bits  used  beyond  those  needed 

max 

to  represent  M  ,  we  can  make  the  following  table: 
max 


c 

0 

1 

2 

3 

Probability  of 
overflow 

£31.8% 

£4.55% 

£6.34x10~3% 

£1.24x10"13% 

We  see  that  as  C  increases,  the  chances  for  overflow  become  negligible  quickly. 

Similarly,  if  C  is  the  number  of  extra  bits  beyond  those  needed  to 
represent  mmax,  then  the  probability  of  overflow  when  representing  w^  is 

Probability  of  overflow  £  P[ I M(0,1 )  1  £  »/S/2(2^-l )] 

The  following  table  evaluates  the  above  function  for  various  values  of  C  and 
S. 
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c 

s 

'  1 

2 

3 

1 

48.0% 

3.6% 

7.42x10'5% 

2 

31.7% 

0.27% 

2.58xl0"10% 

4 

15.7% 

1 

2. 14x10"3% 

~2.xlO"21% 

32 

6.33x10'3% 

3. 56x1 O-31 % 

'10"170% 

400 

1.59x10~42% 

~10‘383% 

-10'2171% 

Again,  the  probability  of  overflow  becomes  negligible  quickly. 

If  C  is  the  number  of  extra  bits  needed  beyond  those  to  represent  m_. 

IMaX 

~  _4 

then  for  the  probability  of  overflow  when  computing  w. .  to  be  <10  ,  we  need 

1  J 

the  following  number  of  extra  bits: 


C 

3 

2 

1 

Range  of  S 

1-3 

: 

4-48 

49-co 

This  may  be  derived  using  [P(w..  >  2^m  )  =  P(xc  >  2CS)]. 

'J  —  max  0  — 

Finally,  when  passing  a  steering  vector  through  the  array,  it  must  be 
remembered  that  if  S  is  an  eigenvector  of  M  corresponding  to  A^,  then 
W  =  M-1S  =  (l/A  ^n)S,  so  that  quantities  on  the  order  of  l/Aml-n  must  be 
introduced  in  the  computation.  These  numbers  of  magnitudes  potentially 
much  larger  than  those  already  discussed  can  be  introduced  either  during 
division  by  <Z^,Z^.>  at  the  right  of  the  array  or  during  back  substitution. 
Thus  the  unit-vector  and  transform-space  methods  miqht  have  an  advantage  over 
the  reverse  flow  method  because  they  do  not  require  passing  numbers  as  large 
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as  V^mi*n  back  through  the  array.  The  unit  vector  method,  of  course,  passes 
numbers  as  large  as  ^ mmax^^mi n) =  lw-jjl  through  the  array.  Still,  the 
transform  space  method  and  unit  vector  method  might  be  able  to  use  the  simpler 
fixed-point  arithmetic  in  the  triangular  array  and  then  only  have  floating 
points  in  the  outboard  processors. 

The  analysis  of  the  method  with  square  roots  is  very  similar  to  the 
above  analysis.  The  Z  1 ( k ) 1 s ,  w. and  w. .  have  the  same  distributions  as 

J  I  J  1  1 

before,  but  the  1/Yw. .  quantity  broadcast  is  new.  It  is  easy  to  derive  its 

Z  ~  o 

distribution  from  that  of  w^j-  In  fact,  since  is  x  with  S  degrees  of 

freedom  and  mean  between  and  mmax>  we  have  the  probability  that  the  most 

significant  digit  of  1 //w^ .  is  more  than  C  bits  to  the  right  of  the  most 

significant  bit  of  l/Ai  is 
'  max 


which  is  very  small,  as  can  be  seen  from  the  last  table.  Also,  the  probability 

that  1/Jw! .  will  need  more  than  C  bits  more  than  1/ A  -  is 

mm 

p  (i/V  “11  I  2C  •  2‘Mw  )  <  r(xl  <-  2'zcs)  • 

which  is  also  very  small  for  C=1  and  S>20  (probability  £  2.8xlCf2°'). 

The  results  presented  in  this  section  are  proved  in  Appendix  G. 
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3.4  GRAM-SCHMIDT  PROCESSING  WITH  0(n)  PROCESSORS 

O 

The  Gram-Schmidt  array  with  0(n  )  processors  can  also  be  implemented 
with  only  0(n)  processors.  This  implementation  can  be  performed  by  collapsing 
the  array  in  the  vertical,  horizontal,  or  diagonal  dimension  (Figure  33);  i.e., 
letting  the  work  of  all  processors  in  a  column,  in  a  row,  or  along  a  diagonal 
be  performed  by  a  single  processor. 

Each  processor  performs  the  same  calculations  as  the  processors  in  the 
0(n  )  array  but  on  multiple  data  sets.  Three  main  differences  between  the 
three  0(n)  implementations  are:  1)  work  is  not  equally  shared  by  the  proces¬ 
sors;  2)  they  have  different  internal -memory  requirements;  and  3)  the  output 
may  be  routed  internally  to  the  same  processor  or  externally  to  another 
processor.  All  of  these  will  be  discussed  in  more  detail.  The  processor 
diagram  for  0(n)  processors  is  shown  in  Figure  34. 

As  can  be  seen  in  Figure  33  for  the  case  of  four  inputs,  one  processor 

2 

must  perform  the  work  of  from  one  to  three  processors  of  the  original  0(n  ) 

implementation.  This  mismatch  is  increased  as  n  increases.  If  W  is  the  amount 

2 

of  work  that  one  processor  in  an  0(n  )  array  must  perform,  then  the  amount  of 
work  for  a  processor  in  an  0(n)  configuration  can  range  from  W  to  (n-l)W. 

Since  the  total  time  in  a  parallel  system  is  determined  by  the  slowest 
processor,  then,  on  the  average,  half  of  the  processors  are  idle. 

The  internal  memory  for  each  processor  must  be  increased  because  it 
must  store  all  of  the  information  required  for  the  calculations  to  be  per¬ 
formed.  This  memory  increase  should,  however,  be  small,  mainly  the  inputs 
and  internal  weights,  which  amount  to  approximately  4(n-l)  complex  words. 

In  an  0(n  )  array,  each  processor  is  independent  of  its  position.  In 


V 


an  0(n)  array,  however,  the  processors  must  be  "aware"  of  the  interconnect 


a.  Vertical  collapsing. 


b.  Horizontal  collapsing. 


c.  Diagonal  collapsing. 


Figure  33.  Collapsing  an  array  into  0(n)  processors.  The  boxes 
represent  work  performed  by  a  processor. 


a.  Standard  configuration. 


b.  Alternative  configuration  for  symmetry. 
Figure  34.  Configuration  using  0(n)  processors 


structure  so  that  they  will  know  where  to  transfer  the  processor  data.  The 
controller  can  handle  this  requirement  by  specifying  the  routing  and,  as 
Figure  34  shows,  all  routing  is  regular.  The  routing  problem  is  not  serious 
and  should  not  prevent  consideration  of  this  method. 

3.4.1  Advantages  and  Disadvantages 

The  obvious  disadvantage  of  0(n)  processors  is  speed.  The  actual 
timing  models  are  developed  in  Appendix  E,  but  time  is  a  function  of  many 

variables.  In  a  block  average  system,  most  of  the  time  is  spent  calculating 

2 

the  internal  weights;  in  this  case,  an  0(n)  system  is  comparable  to  an  0(n  ) 

o  o 

system  (in  actual  practice,  0 ( n )  and  0(n  )  systems  can  be  equal).  The  0(nd) 
system  is  advantageous  when  there  are  a  large  number  of  steering  vectors. 

The  determination  of  which  system  to  use  should  be  based  on  the  relationship 
between  the  number  of  samples,  S,  and  the  number  of  steering  vectors,  K. 

Fault  tolerance  and  reliability  offer  potential  advantages  and  dis¬ 
advantages.  The  0(n)  system  can  be  much  more  reliable  because  of  fewer  parts; 
however,  because  of  the  increased  demand  on  each  processor,  faster  hardware 
operating  in  critical  conditions  may  be  required. 

3.4.2  Tradeoffs  Between  the  Three  0(n)  Designs 

Which  system,  from  Figure  33,  should  be  used  in  a  radar  installation 
depends  on  many  factors.  We  discuss  briefly  the  major  tradeoffs  for  each 
array. 

Since  Section  3.2  showed  that  if  any  column  of  the  array  became  error- 
prone  or  stopped  working,  the  system  could  still  operate,  the  system  in 
Figure  33a  would  be  chosen  for  fault  tolerance  because  one  processor 
corresponds  to  one  column.  In  the  system  in  Figure  33b,  if  the  first 
processor  went  bad,  the  entire  system  would  become  ineffective  which  is  also 


Diagonal  collapsing.  Figure  33c,  has  the  advantage  of  being  able  to 
implement  reverse  flow  without  requiring  backward  busses.  This  implementation 
is  accomplished  by  transposing  the  internal  weights  and  using  forward  flow. 

The  transposition  is  simple  because  each  processor  contains  all  the  internal 
weights  for  the  transposition  (Figure  35).  This  transposition  is  more  dif¬ 
ficult  for  the  other  configurations,  but  reverse-flow  busses  are  cheaper  to 
implement  for  0(n)  systems  because  of  fewer  processors. 


**IU 
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3.5  PROCESSOR  INTERNAL  PIPELINE/PARALLELISM 


Each  processor  must  compute  the  inner  product  of  its  two  inputs  in  order 
to  compute  the  internal  weights.  Once  the  internal  weight  has  been  computed, 
it  is  applied  against  the  processor  inputs  to  compute  the  outputs.  In  this 
section  we  discuss  different  methods  of  implementing  these  computations.  Block 
averaging  is  assumed.  Complex  multiplication  is  performed  with  4  multiplies 
and  2  adds.  Complex  addition  is  performed  with  2  adds. 

3.5.1  Compute  Inner  Product 

The  mathematical  formula  for  calculating  the  numerator  to  determine  the 
weights  is 

S  * 

£  X  Y  , 
i=l 

where  X  and  Y  are  input  samples  and  S  is  the  number  of  samples.  Typically 
S  2n,  where  n  is  the  number  of  adaptive  weights.  The  primitive  operations 
consist  of  multiply  and  add/sub.  Let  the  time  per  operation  be  t.  Input  time 
is  assumed  to  be  l/2t. 

The  techniques  examined  are: 

Sequential 

Sequential  with  overlap 
Full  pipeline 
Parallel/2  multipliers 
Parallel/4  multipliers 
Pipeline/2  parallel  multipliers 
Pipeline/4  parallel  multipliers 
Optimal 

Table  1  provides  a  comparison  of  the  numerators  computed  by  the  above 
implementation  techniques. 
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COMPARISON  OF  INNER-PRODUCT  NUMERATOR  IMPLEMENTATION  TECHNIQUES 


4->  4-> 

4-> 

4-> 

O 

r— ” 

VO 

00 

+•> 

+■> 

**"'“*' 

r— 

r— 

r— 

■4-> 

1 

V/) 

1 

OO 

1 

to 

1 

C/0 

CSJ 

■ — 

’ 

+ 

+ 

+ 

+ 

+ 

•M 

CO 

<✓0 

4-> 

00 

oo 

4-» 

4-> 

o 

VO 

O 

00 

r** 

00 

uo 

r— 

■" 

id 

id 

*r 

•r- 

4-> 

4-> 

C 

C 

0) 

QJ 

3 

3 

0“ 

cr 

0) 

a) 

to 

to 

VO 

i“ 

fd 

S- 

t- 

0) 

vd 

■r* 

a. 

Q- 

<\J 

r—  4-> 

01 

<U  r- 

c 

r— 

#d 

<U 

a. 

<d 

»r“ 

a. 

a. 

Sequential 


In  this  case  we  have  one  multiplier  and  one  adder  with  no  overlap: 

read  X  real 
read  X  imaginary 
read  Y  real 
read  Y  imaginary 
multiply  XR  Yr 

multiply  Xj  Yj 

add 

add  add  to  running  sum  real 
multiply  XR  Yj 

multiply  Xj  YR 

add 

add  add  to  running  sum  imaginary 

Number  of  multipliers:  1 
Number  of  adders:  1 
Time  for  5  samples:  10  St 
Time  between  samples:  10  t 

Sequential  with  Overlap 

This  is  a  slight  modification  of  the  sequential  method,  but  the  adder 
and  multiplier  are  separate  functional  units,  which  enables  both  to  be  used  simul 
taneously : 

read  X  real  add  XjYr  +  XrYj  from  previous  sample 
read  X  imaginary 

read  Y  real  add  to  running  sum  imaginary 
read  Y  imaginary 
multiply  XR  Yr 

multiply  Xj  Yj 

multiply  XR  Yj  add  XRYR  +  XjYj 
multiply  Xj  Yr  add  to  running  sum  real 

Number  of  multipliers:  1 
Number  of  adders:  1 
Time  for  S  samples:  (6S  +2)t 
Time  between  samples.:  6t 


This  method  is  similar  to  sequential,  except  that  each  stage  is  implemented 

so  that  multiple  data  streams  can  be  executed  simultaneously.  We  must  also  change 

the  I/O  either  by  transferring  2  words  In  parallel  on  each  bus  or  making  the 

bus  operate  at  time  l/2t.  We  chose  the  latter  for  this  analysis. 

read  X  real 
read  X  imaginary 
read  Y  real 
read  Y  imaginary 
multiply  XR  YR 

multiply  Xj  Yj 

add 

add 

multiply  XR  Yj 

multiply  Xj  YR 

add 

add 

No.  of  multipliers:.  4 

No.  of  adders:  4 

Time  for  S  samples:  lOt  +  (S-l)t 

Time  between  samples:  t 

Parallel /2  Multipliers 

Since  complex  multiply  can  be  done  in  parallel,  we  connect  two  multipliers 

in  parallel: 

read  XR 

read  Xj 

read  YR 

read  Yj 

multiply 

add 

add 

multiply 

add 

add 

No.  of  multipliers:  2 
No.  of  adders:  1 
Time  for  S  samples:  8St 
Time  between  samples:  8t 
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read  XR 

read  Xj 

read  YR 

read  Yj 

multiply 

add 

multiply 

add 

add 


No.  of  multipliers:  4 
No.  of  adders:  4 
Time  for  S  samples:  7$t 
Time  between  samples:  7t 


Pipeline/2  Parallel  Multipliers 


This  is  the  pipeline  method  with  2  parallel  multipliers  at  the 


multiplier  stages-. 


read  XR 

read  Xj 

read  YR 

read  Yj 

multiply 

add 

add 

multiply 

add 

add 


No.  of  multipliers:  4 

No.  of  adders:  4 

Time  for  S  samples:  8t  +  (S-l)t 

Time  between  samples:  t 

Note  that  in  this  case  the  number  of  multipliers  has  not  been  increased,  and 
time  saved  is  only  2t,  buth  with  added  processor  complexity. 
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Pi  pel tne/4  Parallel  Multipliers 

This  method  has  a  full  complex  multiplier  (4  multipliers)  and  complex 

adder  (2  adders)  implemented  in  parallel. 

read  XR 

read  Xj 

read  YR 

read  Yj 

multiply 

add 

add 


No.  of  multipliers:  4 

No,  of  adders:  4 

Time  for  S  samples:  5t  +  (S-l)t 

Time  between  samples:  t 

For  the  large  additional  complexity  in  the  processor,  this  method  is  only  5t 
faster  than  full  pipelined  and  3t  faster  than  the  pipeline/2  parallel  multi¬ 
plier  method. 

Optimal 

The  parallel  and  pipeline  methods  explained  above  can  take  advantage 
of  overlap  simultaneously,  as  in  the  sequential  method.  The  minimum  system  would 
use  overlap  in  a  full  parallel  sense.  This  system  would  look  as  follows  for 
time  slice  t..  : 

read  XR( t^ ) 
read  YR(tt ) 
multiply  for  t^ 
add  for  t^ 
add  for  tj_3 
read  X  j  ( t  ) 
read  Yj(t^) 

No.  of  multipliers:  4 
No.  of  adders:  4 


Time  for  S  samples:  4t  +  (S-llt 

Time  between  samples:  t 

The  time  for  S  samples  is  only  a  savings  of  t  over  the  previous  method. 

A  graphical  comparison  of  the  times  required  by  all  the  above  methods 
versus  the  number  of  samples  is  given  in  Figure  36. 

3.5.2  Calculate  Weights 

In  Section  3.5.1  we  discuss  the  calculation  of  the  numerator  to 
determine  the  weights.  We  now  discuss  the  calculation  of  the  denominator 
and  the  division. 

$ 

The  denominator  is  calculated  using  the  formula  £  X*X.  Since  X 

i=l 

times  its  conjugate  is  a  real  number,  only  the  real  part  needs  to  be  computed. 
This  calculation  can  be  performed  either  in  parallel  with  the  computation  of 
the  numerator  or  in  sequential  order.  We  have  the  following  table  for 
parallel  computation  (Table  2),  The  effective  rate  shown  in  Table  1  is 
not  always  attained.  If  the  denominator  calculation  is  carried  out  in 
parallel  with  the  numerator  calculation  by  separate  processors,  then  the 
sampling  rate  will  be  governed  by  the  slowest  of  the  two  processes  because 
the  data  will  be  placed  on  the  bus  once  for  all  processors.  In  the  case  of 
the  sequential  processing  we  have  a  time  between  samples  of  lOt  and  5t  for 
the  denominator  and  numerator  calculations, respectively.  The  actual 
ystemtime  between  samples  will  therefore  be  lOt.  Because  of  this  dependency 
on  the  numerator  calculation,  we  can  use  the  same  processor  to  compute  the 
denominator  without  serious  time  penalties.  If  the  denominator  is  calculated 
sequentially  in  the  same  processor  as  the  numerator,  time  t  can  be  saved  on 
inputting  the  value  X. 
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The  division  can  be  performed  by  reciprocal  calculation  and  broadcasting 
the  value  in  the  horizontal  direction.  Assuming  no  overlap,  the  time  to 
compute  the  reciprocal  is  the  time  required  to  perform  one  table  lookup  and 
two  iterations  of  Newton-Raphson  (see  Appendix  D).  The  program  to  find  1/C 
would  be  as  follows: 


Table  lookup 

X.j  estimate 

Multiply 

cx1 

Sub 

2-CX1 

Multiply 

( 2— CX, }X, 

X2  new  estimate 

Multiply 

CX2 

Sub 

2-CX2 

Multiply 

(2-CX2)X2 

x3  =  1/C 

Time  =  7t 

To  broadcast  the  value  and  multiply  by  the  reciprocal  in  each  processor 

requires  the  following  code  in  each  processor: 

Read  reciprocal 
Multiply 

Time  =  2t 

3.5.3  Apply  Weight  to  Input  Data 

The  formula  to  apply  weights,  w,  is  output  =  X  -  WY.  This  is  the  same 
form  as  the  inner-product  calculation,  which  can  be  expressed  as 

sum  =  sum  +  XY 

Because  of  this  similarity  the  programs  are  similar,  as  shown  by  the  sequential 
method: 

read  XR 
read  Xj 
read  YR 
read  Yj 
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multiply  WRYR 
multiply  WjYj 
sub 

sub  XR  -  (WrYr  -  WjYj.) 
multiply  WrYj 
multiply  WjYr 
add 

sub  Xr  -  (MrYj  +  WjYr) 
out  XR 
out  Xj 

Assuming  output  takes  time  l/2t,  all  of  the  previous  formulas  are 
applicable  if  a  step  of  t  is  added;  i«e.,  8t  +  (S-l)t  becomes 
8t  +  (S-lJt  +  t  =  9t  +  (S-l}t. 
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3.6  DESIGN  ALTERNATIVES  AND  TRADEOFFS 

This  section  sunmarizes  the  variety  of  system  configurations  described 
in  detail  in  the  first  part  of  this  report  (Sections  2.6  and  3.1  through  3.5), 
and  indicates  how  various  radar  engineering  considerations  might  influence  the 
choice  of  configuration.  Choosing  the  best  configuration  is  a  complicated 
task  and  depends  heavily  on  the  details  of  a  given  radar  system;  therefore, 
we  only  summarize  some  of  the  tradeoffs.  In  particular,  we  conclude  that 
while  a  universal  element  may  exist,  it  might  be  far  from  optimal  for  any 
individual  system,  since  it  would  have  to  be  designed  for  all  possible  worst- 
case  situations,  and  hence  be  very  expensive,  large,  and  power-consuming. 

The  different  design  alternatives  are  summarized  in  Table  3.  (For 
more  details,  see  the  indicated  sections.) 

The  unit  vector  method  requires  simpler  hardware  than  the  reverse  flow 
method,  but  is  slower,  solving  m  n  x  n  systems  in  time  0 (mn)  instead  of  0(m+n). 
The  transform  space  method  requires  the  simplest  hardware,  but  every  sample 
from  which  filter  functions  are  to  be  computed  must  be  passed  through  the 
array,  which  means  there  can  be  fewer  filter  function  evaluations  per  unit 
time  than  with  the  other  two  methods. 

The  different  means  of  calculating  the  weights  not  only  require  dif¬ 
ferent  amounts  of  hardware  (most  for  window  averaging,  least  for  cascaded 
block)  but  have  different  statistical  properties,  requiring  different  numbers 
of  samples  to  converge  or  update  old  estimates,  and  with  different  lag  times. 

The  higher  performance  desired,  or  the  smaller  probability  of  overflow 
desired,  the  more  bits  are  needed;  and  depending  on  the  basic  implementation 
chosen,  it  may  be  possible  to  have  fewer  bits  in  one  part  of  the  processor 
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TABLE  3.  DESIGN  ALTERNATIVES 


1)  Basic  Implementation  (See  Section  2.6) 

a)  Unit  Vector  Method 

b)  Reverse  Flow  Method 

c)  Transform  Space  Method 

2)  How  to  Compute  Inner  Products  (See  Section  3.1) 

a)  Block  Averaging 

b)  Cascaded  Block  Averaging 

c)  Exponential  Averaging 

d)  Window  Averaging 

e)  Number  of  Samples 

3)  Arithmetic  Used  (See  Section  3.3) 

a)  Fixed,  Floating  Point,  Block  Floating  Point 

b)  Number  of  Bits 

c)  With  or  Without  Square  Roots 

2 

4)  n  Versus  n  Processor  Implementation  (See  Section  3.4) 

a)  Horizontal,  Vertical  or  Diagonal  Collapsing 

b)  Unit  Vector,  Reverse  Flow,  or  Transform  Space  Method 

5)  How  to  Implement  Each  Individual  Processor  (See  Section  3.5) 


than  another,  and  even  have  different  kinds  of  arithmetic  (fixed  versus 


floating  point)  in  different  parts  of  the  processor. 

2 

The  choice  between  n  and  n  processors  depends  on  the  speed  desired  and 
cost  considerations.  The  different  0(n)  implementations  have  different  fault* 
tolerance  properties,  require  processors  of  differing  complexities,  and 
differ  in  how  compatible  they  are  with  the  three  basic  implementations. 

How  to  implement  each  individual  processor  also  depends  on  the  speed 
required  and  cost  limits,  and  is  certainly  dependent  on  the  number  of  bits 
and  type  of  arithmetic  chosen. 

Some  of  the  most  important  radar  engineering  factors  affecting  the 
choice  of  system  configuration  are  given  in  Table  4.  We  discuss  how  they 
affect  the  choice,  but  remind  the  reader  that  the  list  is  not  meant  to  be 
exhaustive,  nor  is  there  necessarily  a  single-best  configuration  for  all 
operating  modes  of  a  given  radar  system. 

TABLE  4.  RADAR  ENGINEERING  CONSIDERATIONS  AFFECTING  CHOICE 
OF  SYSTEM  CONFIGURATION 


1)  Number  of  Weights 

2)  Sampling  Rate 

3)  Electronic  Versus  Mechanical  Scan 

4)  Operating  Modes 

5)  Size  of  a  PRI ;  Amount  of  Dead  Time 

6)  Fully  Adaptive  Versus  SLC 

7)  Number  of  Bits  in  the  ADC 

8)  Amount  of  Clutter 

9)  Performance  Required  (in  Terms  of  SNR  Achieved) 

10)  Cost  Requirements 

11)  Reliability  Requirements 

12)  Environment 


How  fast  the  array  must  process  depends  on  the  number  of  weights, 
sampling  rate,  type  of  scanning  used,  system  mode,  and  the  amount  of  dead 
time.  A  higher  sampling  rate  requires  either  a  faster  array  or  downsampling. 

If  the  sampling  rate  is  slow  enough,  the  transform  space  method  might  be 
sufficiently  fast.  If  there  are  many  steering  signals  for  a  given  covariance 
matrix  (e.g.,  electronic  scan),  then  reverse  flow  might  be  preferable  to  the 
unit  vector  method.  If  there  is  a  small  sampling  window  and  large  amount  of 
dead  time  in  each  PRI,  simple  block  averaging,  with  its  long  startup,  might 

be  sufficient  instead  of  a  more  expensive  implementation  like  window  averaging. 

2 

If  overall  speed  requirements  are  low,  n  processors  instead  of  n  might 
suffice. 

If  the  system  is  a  SLC  instead  of  fully  adaptive,  some  of  the  arith¬ 
metic  can  be  simplified. 

More  bits  on  the  ADC  means  more  expensive  hardware  to  retain  accuracy. 

If  there  is  so  much  clutter  in  the  system  that  there  are  not  enough  weights 
to  adapt  to  it,  then  more  than  the  usual  2n  samples  may  be  required  to  get 
good  performance.  In  general,  the  higher  performance  desired,  the  more  bits 
of  accuracy  and  the  more  samples  required. 

The  many  design  alternatives  discussed  have  widely  varying  costs  (n 
versus  n  processors,  and  number  of  bits  carried  for  example),  so  there  are 
many  speed/accuracy/ performance  versus  cost  tradeoffs. 

The  different  designs  have  different  fault-tolerance  properties,  which 
may  also  vary  with  the  type  of  fabrication  techniques  used. 

The  radar  environment  will  influence  the  design  greatly.  A  land-based 
system  with  large  amounts  of  power,  cooling,  and  spare  parts  available  will 
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certainly  have  less  stringent  packaging,  power,  and  reliability  constraints 
than  one  that  is  used  in  the  field. 

The  dimension  of  the  design  space  can  be  reduced  for  problems  of 
interest  so  that  a  universal  adaptive  algorithm  (UAA)  element  does  exist.  One 
such  breakdown  is  shown  in  Table  5.  For  this  radar  system,  the  UAA  element 
would  be  near  optimal  for  most  configurations.  For  the  30-weight  system, 

21  bits  are  required;  for  the  50-weight  system,  22  bits  are  required  for  inner- 
product  calculation.  But  due  to  MSI  and  LSI  integration  sizes,  either  a  22-bit 
or  even  a  24-bit  system  would  be  implemented. 

The  system  designer  must  decide  which  subset  of  all  possible  adaptive 
systems  the  processors  must  support. 

TABLE  5.  SPECIFIC  RADAR  DESIGN  PARAMETERS 

1)  Number  of  Weights  -  30  to  50 

2)  Sampling  Rate  -  20  MHz 

3)  Electronic  Scan 

4)  Search  and  Track  Modes 

5)  Typical  Times  for  Land-Based  Surveillance  Systems 

6)  Fully  Adaptive 

7)  8-Bit  2  Complement  Complex  ADC 

8)  Low  Center 

9)  3  to  8  dB  Down  from  Optimal  SNR 

10)  Low  Cost 

11)  24-Hour  Continuous  Duty  Every  Day 

12)  Land-Based  Stationary  Environment  Constructed  to  Support 

Processor 


3.7  UNIVERSAL  ALGORITHM  HARDWARE  IMPLEMENTATION 

The  Gram-Schmidt  processor  has  a  very  simple  interconnect  structure 

o 

regardless  of  implementation.  The  bus  structure  for  the  pipelined  (0(n  )) 
configuration  is  shown  in  Figure  37.  All  buses  have  a  single  source,  making 
this  configuration's  the  simplest  interconnect  protocol.  The  recursive 
(0(n ) )  configuration  is  shown  in  Figure  38.  It  has  a  slightly  more  com¬ 
plicated  interconnect  protocol  because  one  interconnect  bus  has  several 
sources,  which  implies  that  each  module  must  know  when  it  should  place  its 
data  on  the  bus.  This  bus  can  be  implemented  with  either  open-collector  or 
tri-state  logic.  A  comparison  of  the  two  configurations  shows  that  each 
processing  element  (PE)  has  the  same  inputs  and  outputs;  thus,  with  the 
addition  of  bus-sharing  logic,  a  processor  can  be  made  to  run  in  either 
configuration. 

The  bus  interconnects  may  be  either  one-half  of  a  complex  word  wide  or 
a  full  complex  word  wide.  The  latter  provides  the  potential  for  a  higher 
processing  rate. 

2 

A  variation  combining  the  0(n  )  and  0(n)  configurations  may  be  imple¬ 
mented.  This  variation  is  shown  in  Figure  39  for  the  five-input  case.  It 
enables  the  system  designer  to  meet  the  speed  requirements  without  using  more 

hardware  than  is  really  needed,  a  good  alternative  solution  for  large  n  when 

2 

the  0(n  )  configuration  would  contain  a  prohibitive  number  of  processors. 

The  processor  control  is  no  more  difficult  than  for  the  0(n)  configuration. 

The  configuration  chosen  by  the  system  designer  is  a  function  of  1)  the 
processor  speed;  2)  the  number  of  degrees  of  freedom  required;  and  3)  the 
required  processing  rate,  which,  in  general,  will  be  slower  than  the  sampling 
rate. 
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Figure  39. 


Variation  combining  0(n2)  and  0(n)  configura 
tions.  Five-input  case  is  shown. 


The  processor  configuration  is  governed  by  the  Gram-Schmidt  equations, 
shown  in  Figure  40.  They  have  been  broken  down  into  their  real  and  imaginary 
components.  The  majority  of  the  operations  performed  are  add,  subtract,  and 
multiply.  Only  one  division  is  required  after  all  the  terms  have  been  summed. 
The  divisor  in  all  processors  in  a  row  will  be  identical.  If  this  process  is 
moved  to  a  dedicated  module  in  each  row,  n-2  sets  of  redundant  logic  have  been 
removed  from  each  row.  This  removal  will  reduce  the  amount  of  logic  per  PE, 
reducing  each  PE's  cost  and  mean-time-between-fai lures  (MTBF).  For  a  system 
with  a  large  number  of  degrees  of  freedom,  this  could  result  in  considerable 
savings  in  system  cost  and  greatly  improve  the  system's  MTBF.  The  control 
becomes  no  more  difficult  and  the  bus  structure  does  not  change  for  either 

p 

the  0(n  )  or  0(n)  configurations  (Figures  41  and  42). 

The  configuration  discussed  in  the  paragraph  above  is  expanded  here  to 
show  a  possible  hardware  implementation.  Two  generic  processing  elements 
are  shown.  First,  the  node  processing  element  (NPE)  computes  all  equations 
except  the  divisor  equation.  This  processor  is  present  at  all  the  nodes  in 
the  Gram-Schmidt  array.  The  second  processor,  the  diagonal  processing  element 
(DPE),  computes  the  divisor  and  its  inverse,  and  then  broadcasts  the  inverse 
to  the  NPEs  in  its  row.  There  is  one  DPE  for  each  row  of  the  Gram-Schmidt 
array. 

A  block  diagram  for  a  node  processing  element  (NPE)  is  shown  in 

Figure  43.  This  processor  does  not  include  the  ability  to  calculate  the 

divisor.  This  NPE  can  perform  block  and  exponential  averaging.  In  addition, 

2 

it  can  operate  in  either  the  0(n  )  or  0(n)  configurations.  Because  of  its 
generality,  the  NPE  would  not  be  practical  to  implement  for  use  in  a  real 


Block  G-S  Equations 

ReCw)  =  Re(w)  +  Re(zj)Re(zJ)  +  Im(zJ)ImCzJ) 

Im(w)  -  Im(w)  +  Re(zj)ImCzj)  -  lm(z])Re(zj) 

w  -  S  +  Re(zj)Re(zj)  +  Im(z] ) Im(Z] ) 

Re(w)  =  Re(w)(1/ft) 

Im(w)  =  Im(w)(l/5i) 

Re(zj+1 )  =  Re(zj)  -  [Re(w)Re(zj )  -  Im(w)Im(zj )] 

Im(Zj+1 )  =  Im(zj)  -  [Re(w)Im(zi)  +  Im(w)Re(zj )] 

Exponential  G-$  Equations 

Re(wn)  =  Re(wn_i )S  +  [Re(z1)Re(z])  +  Im(z1 ) Im(Z^ )]C(1 /wn_1 ) (1  -  S)] 

Im(wn)  =  Im(wn_1  )S  +  [Re(zi)Im(zj)  -  Im ( Z^  )Re (zj 0 C(  1 )(1  -  S)] 

wn  =  wn_lS  +  [Re(z1)Re(z1 )  +  Im(zi)Im(zj)](l  -  S) 

Re(zj+1)  =  Re(zj)  -  [Re(wn_., )Re(z1 )  -  Im(wfl_1  )Im(Z^. )] 

Im(z]+1)  =  Im(Zj)  -  [Re(wn_1 ) Im(z1 )  +  Im(wn_1 )Re(z] )] 

Figure  40.  Gram-Schmidt  (G-S)  equations 
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41.  Processor  for  the  0 ( n  )  configuration. 
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From  Processor 


qure  43.  Block  diagram  for  node  processing  element  (NPE). 


system.  The  inherent  cost  advantages  of  using  Gram-Schmidt  would  be  can¬ 
celled  out. 

The  arithmetic  operations  performed  within  the  NPE  shown  are  serial  in 
nature.  The  processing  speed  of  the  NPE  could  be  increased  by  adding  parallel 
computation  abilities  to  the  processing  element.  This  parallelism  could  be 
expanded  to  the  point  where  every  equation  is  implemented  in  parallel  hardware. 
Some  middle  ground  between  these  two  extremes,  one  that  meets  the  processor 
speed  requirements,  will  usually  be  chosen. 

The  block  diagram  for  the  diagonal  processing  element  (DPE)  that  com¬ 
putes  the  comnon  divisor  is  shown  in  Figure  44.  This  processor  will  operate 
in  the  same  modes  as  the  NPE  and  would  not  be  practical  to  implement  because 
of  its  generality.  Examination  of  the  block  diagram  reveals  no  functional 
block  representing  division.  This  block  is  unnecessary  because  of  the 
following  equation  to  calculate  the  inverse  of  a  number: 

C  =  2  -  ABn 


where 

A  is  the  value  whose  inverse  is  to  be  found, 

B  is  the  inverse,  and 
C  is  the  correction  factor  to  the  inverse. 

This  equation  is  computed  recursively,  approximately  doubling  the 
accuracy  of  the  inverse  on  each  iteration.  The  first  value  of  the  inverse  is 
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COMMON 


BUS 


re  44.  Block  diagram  for  diagonal  processing  element  (DPE). 
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a  coarse  value  found  in  a  lookup  table.  These  two  processing  elements  form 
the  basis  for  the  Gram-Schmidt  processor.  Their  physical  implementations 
depend  on  the  system  requirements. 
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4.  CONCLUSIONS 


We  have  studied  in  detail  the  Gram-Schmidt  orthogonal ization  technique 
and  modular  architecture  for  its  implementation.  The  analysis  and  simulations 
show  that  for  blocked  average  Gram-Schmidt,  the  Gram-Schmidt  system  is  iden¬ 
tical  in  performance  to  sample  covariance  matrix  techniques,  in  particular, 
Cholesky  decomposition.  This  result  shows  that  a  universal  adaptive  algorithm 
does  exist  with  a  modular  architecture  since  the  Cholesky  method  is  universal. 

Also,  because  the  architecture  is  tailored  to  the  Gram-Schmidt  algorithm, 
each  processor  performs  simple  operations  of  the  form  z  =  x  +  Ay.  The 
complexity  of  each  processor  is  Tow,  so  that  high  throughput  paths  can  be 
achieved. 

Figure  45  shows  the  timing  curves  for  different  Gram-Schmidt  implemen¬ 
tations.  Figure  46  shows  a  banded  region  for  Gram-Schmidt  compared  against 
the  timing  curves  determined  under  a  previous  RADC  contract  [Liles  et  al .  1978], 
As  can  be  seen  from  these  figures,  systems  capable  of  real-time  operation  are 
feasible. 

Table  6  shows  a  summary  of  the  conclusions  of  this  study.  We  have 
already  discussed  the  first  conclusion,  the  existence  of  a  good  universal 
adaptive  algorithm  (UAA)  which  uses  a  minimum  number  of  samples  and  has  a 
modular  architecture. 

The  importance  of  the  second  conclusion  is  that  while  a  UAA  may  exist, 
it  would  have  to  be  large,  expensive,  and  power-hungry  to  satisfy  the  perfor¬ 
mance  requirements  of  all  possible  radar  systems.  For  example,  a  land-based 
system  with  200  weights  and  a  20-mHz  sampling  rate,  a  stable,  air-conditioned 

★ 

W.  C.  Liles,  J.  W.  Demmel ,  J.  D.  Mallett,  I.  S.  Reed,  and  L.  E.  Brennan, 
Multidomain  Algorithm  Evaluation,  2  volumes,  TSC-PD-B525-1 ,  Final  Report  on 
RADC  Contract  F30603-76-C-031 9,  Technology  Service  Corporation,  Santa  Monica, 
California,  January  1978. 


5.  Timing  curves  for  different  Gram-Schmidt  implementations. 
One  steering  vector;  two  times  the  number  of  weights  equals 
number  of  samples. 


ng  curves  for  various  processors,  showing  banded  region 
Gram-Schmidt. 


TABLE  6.  CONCLUSIONS 

1.  Universal  adaptive  algorithm  with  modular  architecture  exists. 

2.  Universal  adaptive  algorithm  implementation  for  subset  of  all 
adaptive  space  exists. 

3.  Fast  implementations  are  possible  (Figure  45). 

4.  Processors  are  simple. 

5.  Processors  can  be  improved  as  hardware  advances  are  made. 

6.  The  processor  array  can  be  designed  to  have  good  fault- 
tolerance  properties. 

7.  Possible  to  get  a  priori  probability  bounds  on  growth  of 
intermediate  results. 


environment  with  ample  space  puts  far  different  requirements  on  a  UAA  than 
a  5-weight  system  small,  cool,  and  light  enough  to  carry  on  someone's  back. 

But  given  a  reasonably  uniform  subset  of  system  requirements,  a  UAA  does  exist 
for  that  subset.  A  subset  of  interest  is  ground-based,  permanent  installations, 
30  to  100  weights,  high  reliability,  and  adequate  power  supply.  As  advances 
in  digital  electronics  occur,  the  size  of  subset  areas  increases  until  a 
universal  implementation  does  exist. 

One  must  remember  that,  the  algorithm,  the  processor  interconnect 
structure,  and  the  general  processor  design  are  all  universal.  The  only 
thing  which  can  change  is  particular  implementations  (number  of  bits,  etc.) 
of  the  processor. 

The  third  conclusion  dealing  with  speed  has  been  discussed  above  and 
is  demonstrated  by  Figures  45  and  46. 

The  fourth  conclusion  deals  with  the  complexity  of  a  processor.  Each 
processor  is  very  simple  as  shown  in  Sections  2  and  3.5.  Because  of  this 
simplicity  and  the  advances  of  VLSI,  shortly  a  processor  on  a  chip  will  be 
possible.  Since  the  processor  functions  are  well  specified,  as  new  hardware 
becomes  available,  processors  using  the  new  components  can  be  intermixed  with 
other  processors  without  system  degradation.  This  intermixing  is  due  to 
functional  replacement  on  a  processor  level  (conclusion  5). 

Conclusion  6  deals  with  fault  tolerance  of  the  system.  The  Gram- 
Schmidt  array  is  no  more  susceptible  to  noise  caused  by  misaligned  or  broken 
converters,  receivers,  antennas,  than  any  other  technique.  This  was  an 
unexpected,  but  welcome,  conclusion,  since  Gram-Schmidt  implementations  have 
been  shown  to  be  prone  to  noise  on  digital  computers.  But  due  to  our  imple¬ 
mentation  of  Gram-Schmidt  and  the  nature  of  radar  signals,  the  noise  problem 
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is  not  a  concern.  Also,  if  any  column  of  the  array  becomes  bad,  the  system 
loses  one  degree  of  freedom  and  remains  operative--a  condition  known  as 
graceful  degradation. 

Conclusion  7  was  the  most  difficult  to  arrive  at,  and  more  work  should 
be  performed  to  sharpen  the  results.  Conclusion  7  states  that  a  priori 
probability  bounds  on  the  internal  Gram-Schmidt  weights  and  interprocessor 
communication  can  be  determined.  These  bounds  are  used  to  determine  the  number 
of  bits  used  at  different  parts  of  the  processing  system,  and  what  kind  of 
arithmetic  (fixed  point  or  floating  point)  needs  to  be  performed  in  order  to 
avoid  overflow/underflow  and  bound  effects  of  roundoff  error.  In  Section  3.3 
we  discuss  in  more  detail  the  formulation  of  these  bounds  and  actual  number  of 
bits. 
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5.  RECOMMENDATIONS 


Given  the  results  of  this  study,  we  feel  confident  that  the  modular 
approach  has  benefits  for  the  U.S.  Air  Force  in  the  following  areas:  1)  logistics, 
2)  maintainability,  3)  ease  on  new  system  design,  and  4)  low  life  cycle  cost. 

We  believe  that  a  Gram-Schmidt  system  of  0(n)  processors  should  be 
designed  for  high-speed  updating  of  weights.  The  processor  should  be  designed 
to  be  integrated  into  a  30-  to  100-weight  system.  This  system  would  demonstrate 
proof  of  concept  and  would  serve  as  a  test  bed  for  further  research.  The 

o 

processor  designed  for  an  0(n)  configuration  can  also  be  used  to  study  0(n  ) 
array  architectures;  the  inverse  is  not  true.  Thus,  one  processor  can  be 
used  to  study  both  possible  configurations. 

The  processor  should  accumulate  internally  at  least  24  bits  (preferably 
32)  and  transmit  to  other  processors  16  bits.  The  internal  memory  for  I  &  Q 
inputs  should  be  at  least  512  complex  words. 
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PERFORMANCE  OF  THE  SAMPLE  .COVARIANCE  MATRIX 
ALGORITHM  FOR  ADAPTIVE  ARRAYS 


A.  1  INTRODUCTION 

When  an  adaptive  array  is  implemented  digitally,  the  sample  covariance 
matrix  algorithm  provides  a  direct  method  of  computing  the  adaptive  weights 
and  rapid  convergence  independent  of  the  eigenvalues  of  the  covariance 
matrix.  Previous  analyses  of  this  algorithm^ have  assumed  that  the 
weights  are  computed  using  one  set  of  array  element  outputs  and  these 
weights  are  applied  to  later  array  outputs.  This  report  considers  the 
case  where  the  adaptive  weights  are  tested  against  the  same  set  of  data 
used  in  the  weight  computation. 

For  many  applications,  the  multiple  channel  sidelobe  canceller  is 
the  preferred  adaptive  configuration.  It  can  be  shown  that  the  sidelobe 

r?l 

canceller  is  a  special  case  of  the  more  general  adaptive  array.  J  In 
the  next  section  it  is  shown  that  the  general  adaptive  array  problem  can  be 
transformed  to  an  equivalent  sidelobe  canceller  problem,  a  form  which  is 
more  convenient  for  some  analyses.  It  is  also  shown  that  the  array  per¬ 
formance  is  independent  of  this  transformation,  and  that  the  effective 
weights,  output  S/N,  etc.,  can  be  computed  in  any  convenient  coordinate 
system  provided  the  transformation  of  coordinates  is  non-singular. 

[T]  I.  S.  Reed,  0.  D.  Mallett,  and  L.  E.  Brennan,  "Rapid  Convergence 
Rate  in  Adaptive  Arrays,"  IEEE  Trans .  on  Aerospace  and  Electronic  Systems, 

Vol .  AES-10,  No.  6,  November  1974,  pp.  853-863. 

[2]  S.  P.  Applebaum,  "Adaptive  Arrays,"  IEEE  Trans,  on  Antennas  and 
Propagation,  Vol.  AP-24,  No.  5,  September  1976,  pp.  585-598. 
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A.c  COORDINATE  TRANSFORMATIONS 

Let  X  denote  the  column  vector  of  array  element  outputs, 

Xy  =  ( x,  ,x^ , . . . ,x^) ,  and  Sx  denote  the  corresponding  signal  vector.  The 
noise  covariance  matrix  of  the  array  outputs  is 

Mx  =  E  X  X+,  (A-l ) 

where  Mx  is  a  NxN  Hermitian  matrix  for  an  N  element  array,  E  denotes  the 
expectation,  +  the  complex  transpose,  and  all  noise  components  Chut  no 
signal)  are  included  in 

The  weights  which  maximize  the  S/N  ratio  are 

«x  =  H-’  sx  (A-2) 

and  the  corresponding  array  output  is 

l  =  Wx+  X  =  Sx+  M;1  X  (A-3) 


With  optimum  weights,  the  output  S/N  ratio  is 


r 


<Wx  Sx)2  r+  „-l  r 
TT.  TT~  =  sx  Mx  Sx 


Wx  Mx  Wx 


(A-4) 


Let  T  denote  any  non-singular  transformation,  and 

Y  =  T  X 


(A-5) 
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In  the  new  coordinate  system. 


M  =  E  Y  Yf  =  T  M  T  + 

y  * 


sy  •  T  Sx 


“y  ’  <  Sy 


Combining  Eqs.  (A-5)  through  (A-8), 


(A-6) 

(A-7) 

(A-8) 


W„  =  (T1-)"1  Wy  (A— 9 ) 

y  * 

and 

7  =  W+  Y  =  W*  X  (A-10) 

y  x 

i.e.,  the  output  of  the  array  is  unchanged  by  the  transformation. 

The  sample  covariance  matrix  in  X  coordinates  is 

«*  ■  *  £  **  (A-n) 

where  X^  denotes  the  kth  independent  sample  of  array  element  outputs  and  K 
is  the  number  of  samples  in  the  estimator  of  M  .  The  weights  based  on  a 
sample  covariance  matrix  are 

«x  ■  s* 

Replacing  M  and  M  with  the  corresponding  esimators,  M  and  M  ,  and  fol- 

x  y  ^  y 

lowing  the  analysis  of  the  preceding  paragraph,  it  can  easily  be  shown 


\ 
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the  array  output  with  weights  based  on  a  sample  covariance  matrix  is  also 
independent  of  the  transformation  T.  Hence,  the  analysis  of  adaptive  array 
performance,  including  the  sample  covariance  matrix  algorithm,  can  be  per¬ 
formed  in  any  convenient  coordinate  system  provided  the  required  trans¬ 
formation  is  non-singular. 

A.  3  PROBABILITY  DISTRIBUTION  OF  S/N 


For  any  arbitrary  adaptive  array,  the  input  vector  X  can  be  trans¬ 
formed  to  a  new  set  of  coordinates  in  which  the  signal  is  present  in  only 

one  component  and  the  noise  covariance  matrix  is  diagonal.  First,  let 
-1/2 

V  =  M  X  to  diagonalize  the  noise  covariance  matrix.  Then, 


Mv  =  F  M^1/2  X  X+  Mj1/0  =  I 


Sv  ■  Kl'n  Sx 


(A-13) 


Next,  rotate  the  coordinates  by  a  unitary  transformation  U  so  that  the  S 
vector  is  non-zero  only  in  the  first  component,  and  normalize  its  amplitude 
to  unity 


y  =  (s;  1  sxr1/2  u  v 


(A-14) 


-  r'^lJ  M^2 

0  X 


X 
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where  rQ  is  the  S/N  ratio  with  optimum  weights,  given  by  (A-4).  Then, 


My  -  r;1  U  m;1/2  Hx  U+  .  J-  I 


S  .  r-'/2  „  „-l/2  Sx  . 


(A-15) 


(A-16) 


Note  that  the  output  S/N  ratio  in  the  new  coordinate  system  is 


sy  "y1  sy  *  <My):i{  (A',7) 


i.e.,  the  (1,1)  element  of  the  matrix  M~^ . 

The  sample  covariance  matrix  algorithm  can  be  analyzed  conveniently 
in  the  new  coordinate  system.  The  subscript  y  will  be  dropped  in  the 
following  equations.  Again,  the  sample  covariance  matrix  is 


M  =  1  T  y  Y' 
K  |“j  k  k 


The  weights  based  on  this  estimator  are 


W  =  M'1  S 


(A-15) 


(A-19) 


1 


When  these  weights  are  tested  on  a  different  set  of  samples  than  those 

/\ 

used  in  estimating  W,  the  S/N  ratio  r^  is 


r, 

1  W  M  W 

The  ratio  of  r-j  to  the  S/N  with  optimum  weights  is 


(A-20) 


Pi 


(S+  M"1  S)2 

— + — it — — +  — n — 
(S  M  1  S)(ST  M  1  M  M"1  S) 


(A-21 ) 


The  probability  density  of  this  variable  P|  was  derived  in  [1], 

In  some  cases  of  interest,  the  weights  may  be  applied  to  the  same 
set  of  samples  used  in  computing  W.  In  this  case,  the  output  S/N  ratio  is 


+  ~  i  ~  i 

r  =  S  M  1  S  =  (M  ')n 


(A-22) 


The  analysis  of  [1]  can  be  extended  to  obtain  an  expression  for  the  pro¬ 
bability  density  of  r. 

Tl  31 

The  sample  covariance  matrix  has  a  complex  Wishart  distribution1-  ’  , 

i  .e. , 


p(A)  =  exP  {  "tr(M_1  A)} 


(A-23) 


[3]  N.  R.  Goodman,  "Statistical  Analysis  Based  on  a  Certain  Multi¬ 
variate  Complex,  Gaussian  Distribution,"  Ann.  Math.  Stat. ,  Vol .  34, 

March  1963,  pp.  152-177. 


where  I  A|  denotes  the  determinant  of  A,  N  is  the  number  of  elements  in  the 
array,  tr  denotes  the  trace  of  the  matrix,  and  A  =  K  M.  The  constant  I (M) 
is  a  function  of  K,  N,  and  the  covariance  matrix  M.  In  Eq.  (A-23),  P (A ) 
is  the  joint  probability  density  of  the  elements  of  A,  and  is  restricted 
to  those  matrices  A  which  are  positive  definite.  It  assumes  that  the 
underlying  noise  process  is  complex  Gaussian. 


Consider  the  following  representation  of  the  matrix  A. 


( A-24 ) 


where  A^  is  a  scalar,  is  a  (N-l)xl  column  vector  equal  to  A12’  and 
is  a  (N-l)x(N-l)  njatrix.  As  in  [1],  A  can  be  factored  as  follows 


and 


(A-25) 


=  I  A-j  -j~A1 2 


rl  A 
72  21 


72l 


(A-26) 
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Let 


D11  =  A1 1 ”A12  A22  A12 
°12  =  A12  =  A21 
°22  =  A22 

The  Jacobian  of  the  transformation  from  (A^,  A-^,  A ^ )  to 
is  one,  so 


P^DirD12,D22^  =  °nN  I  °22 f  K  N  TW  exp{  ~ ( D1 7  +Dl 2  °22  D1 

=  p(^l}^  P^1 2’^22^  * 


where 

p(Dn)  =  C1  D^N  exp|-r0D11| 


and  C-|  is  a  constant. 

Representing  A"1  in  the  same  form  as  Eq.  (A-24), 


it  can  easily  be  shown  that  A^  =  .  Since  A  =  K  M  and, 

U11 


(A-27) 

Dir  D12*  °22) 

+  tr  °22)ro} 
(A-28) 

(A-29) 

(A-30) 

from  Eq.  (A-22) 


138 


the  output  S/N  ratio  is 


r- 

Let  p  =  r/r  =  — — —  .  From  Eq.  (A-29) 

0  ro  U11 

r  i^K“N+1 

P(p)  g~'/-Nt2  e*p{-  • 

Normalizing  this  distribution,  C2  =  ,  and 


(A-31 ) 


(A-32)  * 


P(p)  - 


,K-N+1 


(K-N)! 


K-N+2  exp 


(A-33) 


This  is  the  probability  density  function  for  the  normalized  output  S/N 

A 

ratio,  p  =  r/r  ,  when  the  same  samples  are  used  in  M  for  computing  the 
weights  and  for  testing  the  weights. 

From  Eq.  (A-33), 


P  = 


K 

K-N 


(A-34) 


The  mean  output  S/N  ratio  for  the  same  samples  is  greater  than  with 
optimum  weights,  W  =  M"^  S.  That  is,  p  >  1. 


139 


Appendix  B 

DERIVATION  OF  NOISE  SENSITIVITY  RESULTS 

This  appendix  derives  the  results  used  in  Section  3.2  We  know  [Brennan 
and  Reed  1973]  that  the  SNR  of  an  adaptive  system  with  true  covariance  matrix 
M,  steering  signal  S,  and  weights  W  is 

SNRw  =  |S*W|2/W  •  MW  .  (B-l ) 

When  W  =  WqPT  =  M”^S,  we  get 

S*W  =  S*(M_1 S)  =  S*(M*'1M)M'1S  =  (M-1S)*M(M-1S)  =  W*MW  ; 
so 

SNRopT  =  |S*W|2/W*MW  =  S*W  *  W*MW  =  S*M_1S 
When  W  =  WqPj  +  E,  where  E  is  an  error  vector,  we  get 

SNRW  =  |S*(WopT  +  E)|2/(WQpT  +  E)*M(WopT  +  E) 

-  [|S*WopTj2  +  2Re(S*WopT)(S*E)  +  |S*E|2] 

v  (wopt*MWopt  +  2ReWopT*ME  +  E*ME) 

=  (SNR2pT  +  2SNRopTRe(S*E)  +  | S*E | 2 )/ ( SNRQpT  +  2ReS*E  +  E*ME) 

=  snropt  +  ( | S*E| 2  -  snropte*me)/w*mw 
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=  0  -  (E*ME/W*MW)]SNRopt  +  (e*me/w*mw)(|s*e|2/e*me) 

=  [1  -  (E*ME/W*MW)]SNRopt  +  (E*ME/W*MW)SNRe 
=  SNRopT  +  (E*S*SE  -  SNRopTE*ME)/W*MW 

=  SNRopT  -E*[M  -  (S*S/SNR0PT)]e  •  (SNRopT/W*MW)  .  (B-2) 

Since  M  is  positive  definite  Hermitian,  S*S  is  positive  semidefinite  Hermitian 
of  rank  1 ,  and 

x*[M  -  (S*S/SNRopT)]x  =  x*Mx/SNRQpT  •  [SNRopT  -  (x*S*Sx/x*Mx) ] 

=  (x*Mx/SNRopT)(SNRQpT  -  SNRx)  >  0 

we  have  M  -  (S*S/SNRgpy)  is  positive  semidefinite  Hermitian  of  rank  n-1 
(W  =  M_1S  is  in  the  null  space)  and 

0  <  X[M  -  (S*S/SNRopt)]  <  Xmx(M)  -  Xm1nS*S  =  X|nax(M)  . 

Also 

W*MW  =  (WopT  +  E)*M(WQpT  +  E)  =  WopT*MWopT  +  2ReWopT*ME  +  E  •  ME 
>  snropt  -  2 1 | S | |  •  | | E 1 1  +  *min  •  ||E[|2  ; 

so 

SNRopt/W*MW  <  1  +  ( 2 | | S | |  •  | ! E| |/SNRopt)  +  0(||E1|2)  =  1  +  0(||E||). 
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When  W  =  W  +  E,  W  not  necessarily  optimal,  we  get 


SNRQ  =  [|S*Wj2  +  2Re(S*W)(STE)  +  |S*E|2]/(W*MW  +  2ReW*ME  +  E*ME) 

=  (assuming  |2ReW*ME  -  E*ME|  <  1  for  small  E) 

*  [|S*W|2  +  2Re(W*SS*E)  +  |S*E|2]  •  (1/W*MW) 

•  [l  -  (2ReW*M£/W*MW)  -  (E*ME/W*MW)  +  4Re2(W*ME)/(W*MW)2  +  0(||E||3)] 
=  |S*W|2/W*MW  +  (|S*W|2/W*MW)  •  (-2ReW*ME/W*MW)  +  2ReW*SS*E/W*MW 
+  ( |S*Wj2/W*MW)(-E*ME/W*MW)  +  (S*W)2/W*MW-[4Re2W*ME/(W*MW)2] 

-  [4Re(W*SS*E)Re(W*ME)/(W*MW)2]  +  |S*E|2/W*MW  +  0(||E||3) 

=  SNRW  +  SNRw(-2/W*MW)  •  Re[W*(M  -  SS*/SNRW)E] 

+  (SNRW/W*MW)  •  (  (E*SS*E)(1/SNRW)  -  E*ME 

I 

+  4ReW*ME/(W*MW)  «  [ReW*ME  -  Re(W*SS*E)/SNRw] }  +  0(||E||3) 

=  SNRW  -  2(SNRW/W*MW)  •  Re{[(M  -  SS*/SNRw)W]*e) 

+  SNRW/W*MW  Rej[4(KeW*ME/W*MW)W*  -  E*] (M  -  SS*/SNRW)E) 

+  0(| | E | | 3 )  .  (B-3) 


Now  we  turn  to  the  problem  of  inverting  M  =  M  +  A,  where  M  is  a  covariance 


matrix  and  A 


k£ 


{ p  if  k=£=j,  0  otherwise}.  We  may  write 
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M'1  =  (M  +  A)-'  =  M_1M(M  +  A)'1  =  M_I[(M  +  A)M"‘] 


/u  >  * \“1  _ 


,-h-l 


M”1 ( I  +  AM'1)"1 


(B-4) 


th  —  1 

Since  AM  is  all  zeroes  except  for  row  j,  which  is  the  j  n  row  of  M" 
multiplied  by  p,  we  can  write 

-1  .  I_il  -inUth 


/ 

0 

(i  +  p 

mjl  ,Jn 

m  .  .  .  m 

V 

1 - 

O 

(B-5) 


l  •  •  * 

where  M  =  {m.  .}  and  M  =  {m1^}.  The  inverse  of  the  expression  in  parentheses 

1  J 

is  given  simply  by 


-1 


I  +  P 


mjl  . 

0 

.  nr>n 

• 

CL 

1 

II 

m^1  . 

0 

.  mjn 

0 

- 

j  1  +  pmJJ 

0 

(B-6) 


''-l 

so  that  M  is 


"'-1  -1  -1 
M  1  =  M  1  +  M  1 


tl  +  pm 


JJ 


0 

0 


i»jl  .  .  .  .  «,■>" 


(B-7) 


This  formula  is  a  special  case  of  the  Sherman-Morrison  formula  [Dahlquist  and 
Bjdrck  1974,  p.  16ll* 

/N  /V  1 

The  new  weights  W  =  M"  S  are  given  by 


W  =  W  +  M 


-1 


\1  +  pm 


0 


<-row  j 


( B -8 ) 


* 

G.  Dahlquist  and  A.  Bjork,  Numerical  Methods  (trans.  N.  Anderson), 
Englewood  Cliffs,  New  Jersey,  Prentice  Hall ,  1974. 
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""■I  nw 


where  W  -  {Wj}  =  M-1S  are  the  original  weights.  Thus  the  new  weights  equal 

the  old  weights  plus  the  jth  column  of  M'1  times  [-p/(l  +  pnr^)].  We  can 
*_1 

express  S*M  S,  the  optimal  SNR  of  the  system  with  noise,  in  tenns  of  the  old 
optimal  SNR  =  S*M_1S,  using  Eq.  (B-8): 


SNRNEW0PT  =  S*^’ls  =  S*W  =  s*w  +  S*M"'  •  £-pX1  +  Pm J  J )  ]| 


,-l 


WJ 


snroldopt  +  w* 


wj 


C-p/ (1  +  pmJ^)] 


=  SNR, 


OLDOPT 


-  Cp|wj  I/O  +  pm'3,3)] 


(B-9) 


A  A 


If  the  covariance  matrix  is  still  M,  but  weights  W  =  M_1S  are  used,  we  may 

A 

compute  the  new  SNR  using  E  =  W  -  W  and  Eq.  (B-2).  We  need  to  know 


E*ME 


„-l 


VI  +  pmJJ/ 


0 


Wj 


M -V- 

' 1  +  pnOJ  / 


- 1 

o 

_ 1 

W- 

J 

.0  . 

SE- 


(1  +  pmJJ) 


ij  \  2 


‘  0  ' 

★ 

0  ‘ 

w. 

M'1 

w- 

J 

J 

_  0  _ 

0 

P2|»^33 

(1  +  pm^)2 


(B-10) 


and 
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S*E  =  W*ME  =  W* 


(i  +  ^Jj) 


Wj 


1  +  pm'1  3 


(B-l 1 ) 


and 


W*MW  =  SNRopT  +  2ReW*pTME  +  E*ME 

2  ,,  p2|w,|2m33 

=  SNRqpt  +  2p|w.  /(I  +  pm'33)  +  - 3 — -rj- 

UPI  J  fl  +  om33!2 


(B-l 2) 


Then  the  SNR^  is 


snrw  =  snropt 


SNR 


OPT 


SNRopt  -  2Plwj|2/0  +  pmjj)  +  P2|wj|2mjj/(1  +  pm33)2 
•  [p2  | w j |2m33/(l  +  pm33)2  -  P2 I wj | 4/ ( 1  +  pm33)2/SNR0pT] 


=  SNR 


p2 1 w  - 1 2m33/(l  +  pm33 )2 
■ .  ,  ,J  . 


0PT  SNR0pT  -  p|Wj  1 2/(l  +  pmJJ)[2  -  piPjj/d  +  pm^)] 


SNR 


|Wil 


0pT  mjj 


=  SNRqpt  •  P2|wj|2mjj/CSNRopT(l  +  prn33)2  -  p2|wj|2(2  +  pm33)] 


SNR 


,Wi 


OPT  mjj 


( B-l  3) 


By  differentiating  Eq.  (B-l 3 )  with  respect  to  p,  we  can  show  SNRW  is  a  monotonic 
decreasing  function  of  p,  and  that  lim  SNR.,  =  SNRnDT  -  |w.|2/m33,  the  same 

p-x»  w  un  J 

lower  bound  as  for  Eq.  (B-9)  • 
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EXAMPLE  OF  USING  GRAM-SCHMIDT  TO  SOLVE  A  SYSTEM 
OF  SIMULTANEOUS  EQUATIONS 


where 


Assume  the  matrix  equation 


Ax  =  b 


A  is  real  positive  definite  Hermitian  and 


'2 

4 

6' 

A  = 

4 

12 

20 

6 

20 

36 

_ 

_ 

x  =  unknown  = 


The  network  used  is  shown  below. 


Here, 


0  =  i2  -  WI-j 

We  will  solve  for  x  by  decomposing  A  into  the  form  LDL^.  This  decomposition 
is  performed  by  passing  each  row  of  the  matrix  through  the  network,  using  the 
following  steps: 

Step  1 .  Pass  row  1  of  the  matrix  through  the  array  to  calculate  the 
first  row  of  weights. 

246  Row  1 


w-|  2  a 1 3/ a ~  i  6/2  -  2 


d 


11  " 


J11 


=  2 


Step  2.  Pass  row  2  of  matrix  through  the  array  to  calculate  second  row 
of  weights. 


4  12  20 


20  -  3(4)  =  8 
*2  8 

w22  "  =  4  ~ 


2 


d 


22 


4 
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Step  3.  Pass  row  3  of  the  matrix  through  the  array  to  calculate  d 


To  solve  the  system  of  equations,  we  have  to  perform  the  following  steps  as 
well,  because  LDlJx  =  b  —  x  =  l."^[D~^(L  ^b)]: 


Step  4.  Pass  b  through  the  array  to  perform  Lb 


4  16  30 


yl 


y 

=  4 


y2  =  8 

y3  =  2 


Step  5.  z  =  D  and  since  -D  is  diagonal,  it  is  easy  to  invert. 


1/2 

0 

0 

4 

2 

0 

1/4 

'  0 

y  = 

8 

z  =  D_1y  = 

2 

0 

0 

1/2 

2 

1 

- 

- 

- 

_  _ 

Step  6.  x  =  ^z,  which  will  be  done  by  the  reverse  flow  method. 

We  pass  z  through  the  network  in  opposite  direction  from  normal 

-1  0  1 


The  vector  x  has  now  been  determined. 
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We  now  present  a  solution  to  the  problem  being  solved  which  differs  from 
the  previous  solution  because  it  eliminates  steps  4  and  5  by  using  an  augmented 
from  of  the  matrix  (to  be  explained  below). 


Solve  Ax  =  b 


A 


2  4  6 
4  12  20 
6  20  36 


b 


A  augmented 


4 
16 
_  30  _ 

2  4  6  4 

4  12  20  16 
6  20  36  30 


A  augmented  is  simpl.y  the  A  matrix  with  b  added  as  the  last  column.  The 
network  also  has  an  added  column  as  shown  below. 


The  steps  of  the  augmented  form  of  solution  are  as  for>ows: 
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Step  1.  Pass  row  1  of  matrix  through  array  to  calculate  first  row  of  weights. 


Step  2.  Pass  row  2  of  matrix  through  array  to  calculate  second  row  of  weights 


4  12  2(1  16 


Step  3.  Pass  row  3  of  matrix  through  array  to  calculate  third  row  of  weights 


15 


Note  that 


W13  =  2 
W23  =  2 


’33 


These  values  are  the  same  as  z  in  step  5  of  the  previous  solution.  We  can 
proceed  with  the  reverse  flow  or  unit  vector  method  to  determine  x. 


Step  4.  We  now  show  one  step  of  the  unit  vector  method  as  discussed  in 
Section  2.6.1.  The  unit  vector  method  passes  a  unit  vector  through  the  array 
and  performs  the  dot  product  of  the  array  output  with  the  vector  z  to  determine 
an  element  of  x. 

A  column  of  the  array  performs  a  calculation  very  similar  to  dot 
product  and  can  be  used  in  that  manner,  as  shown  below: 


By  setting  1^=0,  the  column  output  is  the  negative  of  the  dot  product 
of  the  column  weights  and  the  input  vector.  We  now  perform  the  unit  vector 
method  to  calculate  x-j. 
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The  array  output  is  1;  therefore,  x-|=-l  because  the  last  column 
output  is  the  negative  of  the  dot  product.  x,=-l  1S  the  same  answer  obtained 
in  the  previous  solution. 

Multiple  b 1 s  (Multiple  Steering  Vectors) 

This  modification  .consists  of  adding  enough  extra  columns  to  handle 
the  additional  b  vectors.  The  illustration  shows  the  configuration  for  4b 
vectors,  and  a  matrix  of  order  3. 


Matrix  Ini'u 


b  Vo-  tor  I nmit 


Let  N  be  the  matrix  dimension.  Then  N  additional  processors  are 
needed  for  each  b  vector.  For  Kb  vectors,  K*N  additional  processors  are 
needed. 

To  determine  x  for  Ax=b,  either  reverse  flow  or  unit  vector  methods 
can  be  used.  If  the  unit  vector  method  is  used,  the  total  time  is  independent 
of  the  number  of  b  vectors.  The  total  time  would  be  derived  as  follows: 

N  =  dimension  of  matrix 
t  =  time  per  array  stage 

There  are  N  unit  vectors  and  N  stages;  thus,  total  time  is 
Tuv  =  Nt  +  (N-l)t  =  (2N-1  )t 

For  the  reverse  flow  method,  the  time  is  a  function  of  the  number 
of  b  vectors  (K).  Using  the  same  definitions  as  above,  the  total  time 

%  =  (N-l)t  +  (K-l)t 
=  (N+K-2 )t 

The  reverse  flow  method  is  faster  for  K<N+1  but  also  requires  more 


complicated  processors. 


Appendix  D 

RECIPROCAL  AND  SQUARE-ROOT  CALCULATION 

Depending  on  which  Gram-Schmidt  method  is  used  (LDL*  or  LL*),  either  the 
reciprocal  or  the  reciprocal  of  the  square  root  or  a  real  number  must  be 
calculated  at  each  level.  These  calculations  are  performed  using  Newton- 
Raphson  iteration.  Newton-Raphson  has  the  advantage  of  quadratic  conversion 
(the  number  of  accurate  bits  doubles  with  each  iteration  step)  and  the 
processing  can  be  performed  without  divisions.  The  general  recursive  rela-^ 
tionship  is  =  xi  -  F(x1- )/F*(x^).  For  reciprocal  of  A  the  function 
F(x)  =  (1/x  -  A).  Applying  the  reciprocal  function  to  the  general  function 
we  have  the  following: 


F(x) 

xi+l  "  xi  '  ro 0 
1 

^  -  A 

xi+l  "  xi  "  -2 

"xi 

■  xt  f  xi  -  ^ 

•  (2-Axpx, 

The  seed  value  for  xQ  is  determined  by  table  lookup  and  is  an 
approximation  of  1/A.  Each  iteration  step  doubles  the  accurate  bits. 

The  square  root  operation  is  performed  almost  identically.  The 

p 

function  F  =  (1/x  -  A),  so  the  recursive  formula  becomes 

xi+i  =  (3  -  A)x./2 


% 
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The  divide  by  2  is  easy  to  implement,  using  a  shift  operation.  The 
reciprocal  of  the  square-root  operation  is  slower  than  the  reciprocal  operation 
due  to  an  extra  multiplication  and  shift. 


I 
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TIMING  EQUATIONS  FOR  BLOCK  AVERAGE 
AND  O(n)  PROCESSORS 

The  processor  computation  abilities  are  those  listed  in  Section  3.4. 
The  0(n)  processors  are  connected  and  perform  the  processing  as  shown  in 
Figure  E-l .  The  reason  for  the  processor  on  the  first  input  is  to  compute 
the  first  denominator  and  its  reciprocal. 

o 

The  timing  equations  developed  in  Section  2.6  for  an  0(n  )  array  apply 

to  the  0(n)  array  when  one  equation  is  to  be  solved.  When  more  equations  are 

2 

to  be  solved  (in  a  pipeline  by  the  0(n  )  array)  the  differences  in  time 

2 

between  the  0(n)  and  0(n  )  processor  implementations  become  apparent.  The 
three  stages  of  calculation  using  the  0(n)  array  are: 

1.  Calculate  all  internal  weights. 

2.  Pass  S  through  the  array. 

3.  Solve  for  the  weight  W. 

We  build  up  the  timing  equation  in  the  above  order  for  three  implementations: 
Sequential 
Full  pipeline 
Optimal 

E.l  CALCULATE  ALL  INTERNAL  WEIGHTS 

For  each  implementation  method,  we  have  the  following  steps: 

A.  Compute  numerator  and  denominator. 

B.  Compute  reciprocal. 

C.  Multiply  by  reciprocal. 

v 

D.  Pass  data  through  array,  applying  internal  weights. 


The  timings  for  these  basic  steps  are: 


configuration . 


iming  measurements. 


A 

B 

C 

D 

Sequential 

10  St 

7t 

2t 

11  St 

Full  pipeline 

10t+(S-l)t 

7t 

2t 

llt+(S-l)t 

Optimal 

4t+(S-l)t 

7t 

2t 

5t+(S-l )t 

We  must  calculate  Internal  weights  at  N-l  levels;  se  the  general 

timing  equation  is 


T  =  (N-l)A  +  (N-l)B  +  (N-l)C  +  ( N-2 ) D 


The  specific 

Sequential 
Full  Pipeline 
Optimal 


timing  equations  are: 

Arbitrary  N,S 
(21 NS+9N-32S-9) t 
(25N+28N-3S-38)t 
(2SN+16N-3S-20)t 


S  =  2N 

(42N2-55N-9)t 
(4N2+22N-38)t 
(4N2+1 0N-20)t 


o 

The  only  difference  between  the  times  for  0(n)  processors  and  0(n  ) 
processors  is  that  with  0(n  )  processors,  there  is  overlap  between  applying 
the  internal  weights  calculating  the  numerator.  This  overlap  can  be  obtained 
with  0(n)  processors  if  the  processing  power  of  each  processor  is  doubled. 
Another  approach  is  to  use  twice  the  number  of  processors  as  shown  in 
Figure  E-2.  This  is  still  an  0(n)  processor  configuration  and  the  processors 
can  be  identical . 


E.2  PASS  S  THROUGH  THE  ARRAY 

With  only  0(n)  processors,  no  pipelining  can  be  performed.  By  design¬ 
ing  the  processors  with  recirculating  pipelines,  we  can  handle  L  steering 
vectors  at  a  time, where  L  is  the  length  of  the  pipeline.  For  K  steering 
vectors,  fx/L-!  passes  are  required.  Another  method  is  to  pass  all  K  vectors 
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NEW  INTERNAL  WEIGHTS 
ROUTED  BACK  TO  FIRST  ROW 


Figure  E-2.  O(n)  processor  configuration  with  same  speed  to 
calculate  internal  weights  as  O(n^)  processors 
using  block  averaging. 


through  sequentially.  In  this  case,  a  memory  (first-in,  first-out  (FIFO) 
buffer)  of  length  K-L  is  needed  for  recirculating  (Figure  E-3).  If  K-L  _>  L, 
then  additional  processing  may  be  disired  on  the  feedback  path  instead  of 
simply  a  FIFO-buffer;  this  is  the  model  we  will  use.  This  model  applies  only  to 
full  pipeline  and  optimal  implementations.  The  sequential  implementation 
does  not  offer  any  pipeline  possibilities.  The  timing  formulas  are: 


Arbitrary  N,K 

K  =  1 

Sequential 

(N-l ) 1 1 Kt 

1 1 (N-l ) t 

Full  Pipeline 

(N-l )max(K,ll )t  +  (K-l)t 

11 (N-l )t 

Optimal 

(N-l )max(K,5)t  +  (K-l)t 

5 ( N-l )t 

The  maximum  function  is  due  to  the  recirculating  buffer.  If  K  is  less 
than  the  length  of  the  pipeline,  then  we  use  the  pipeline  length,  or  else  we 
must  include  the  recirculating  path. 


E.3  SOLVE  FOR  WEIGHTS 

We  solve  for  both  unit  vector  and  reverse  flow  methods. 

Unit  Vector  Timings 

For  the  unit  vector  method,  we  must  pass  through  K+NK  vectors.  We  can 
use  the  timing  formulas  just  developed  for  passing  K  steering  vectors  by 
substituting  K+NK  for  K.  These  formulas  are: 


Sequential 
Full  Pipeline 
Optimal 


Arbitrary  N,  K+NK 
(N2-l )1 1 Kt 

(N-l )max( K+NK.l 1 )t  +  ( K+MK-1 ) t 
(N-l )max(K+NK,5)t  +  (K+NK-1 ft 


N  >  10 
(N^-l )  11  Kt 

(N2K+NK-1 )t 

(N2K+NK-1 )t 


Since  we  are  interested  in  large  cases,  the  overall  timing  formulas 
now  given  are  with  N  >  10:  so  a  recirculating  pipeline  is  used. 
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For  K  Input  Vectors 

L  Forward  Pipeline  Length 

Requ i re 

K-L  Feedback  Stages 
Effective  Pipeline  Length  = 
Max(KjL) 


Figure  E-3.  Recirculating  pipelines. 


Timing  Formulas  for  Complete  Process 
Arbitrary  S,K;  N  >  10 
Sequential  (21 SN+1 1 N2K+9N-1 1 K-32S-9) t 

Full  Pipeline  (2SN+N2K+NK+28N-3S-39)t 
Optimal  (2SN+N2K+NK+16N-3S-21 )t 


N  >  10,  S=2N,K=1 
(53N2-55N-20)t 
(5N2+23N-39)t 
(8N2+11 N-21  )t 


For  the  special  case  (N  >  10,  S=2N,K=1)  the  above  are  approximately 

p 

two  times  slower  than  the  0(n  )  implementations.  As  K  increases,  this 

2 

difference  becomes  larger  due  to  this  N  K  term  in  the  above  formulas. 

Reverse  Flow  Timings 

The  reverse  flow  timings,  in  general,  are  not  simply  twice  the  forward 
flow  for  K  steering  vectors  as  is  the  case  with  0(n  )  processors.  The  excep¬ 
tion  is  when  diagonal  collapsing  is  used  because  then  the  array  is  symmetric 
from  both  the  input  and  output  ports.  The  reverse  flow  timing  for  horizontal 
comparison  is  the  same  as  forward  flow  from  K  steering  vectors  for  vertical 
comparison.  For  vertical  comparison  just  interchange  horizontal  and  vertical 
in  the  above  sentence.  This  is  summarized  in  Table  E-l .  We  will  only  derive 
the  timing  for  vertical  collapsing. 


TABLE  E-l.  TIMINGS  FOR  THREE  0(n)  IMPLEMENTATIONS 


Impl ementation 

Pass  K  Steering  . 
Vectors  Through  Array 

Reverse 
K  Steering 

Flow 

Vectors 

Vertical 

Collapsing 

Horizontal 

Collapsing 

*  - - — 

-  -----  . . j 

Diagonal 

Col  lapsing 

NOTE:  Times  at  end  of  double  header  arrows  are  the  same. 
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The  last  processor  must  operate  on  (N-l)  streams  before  the  other 
processors  can  compute  their  tasks.  After  the  last  finished  there  are  (N-2) 
processors  waiting  for  the  output, which  must  be  processed  in  order.  A 
question  which  arises  is  whether  to  process  one  steering  vector  at  a  time 
or  the  ith  element  from  all  steering  vectors  before  the  i-lst  element.  Both 
of  these  methods  have  the  same  throughput  rate. 

The  timing  equations  for  reverse  flow  only  are: 

Arbitrary  N,K 
Sequential  ( KN-K+N-2 ) lit 

Full  Pipeline  (KN-K+1 1 N-l 2)t 

Optimal  ( KN-K+5N-6 ) t 

The  computer  timing  equations  for  all  three  steps  are: 

Arbitrary  N,K,S 

Sequential  (21SN+22NK+20N-22K-32S-31 )t 

Full  Pipeline  [25N+NK+39N-3S-51+(N-1 )max(K,ll )]t 

Optimal  [25N+NK+21 N-3S-27+(N-l )max(K,5)]t 

Because  of  the  symmetries  between  horizontal  and  vertical  collapsing, 
the  above  formulas  apply  to  both. 

Comparison  of  Unit  Vector  and  Reverse  Flow  Times 

The  reverse  flow  method  is  faster  if  the  following  conditions  hold: 

N  >  10 


S=2N,K=1 

(42N2-22N-53)t 

(4N2+45N-62)t 

(4N2+21N-32)t 


Sequential 
Full  Pipeline 
Optimal 


All  cases 

N2K+1 2>1 1 N  +  (N-l )max(K,ll ) 


All  cases 
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If  we  add  the  restriction  that  N  must  be  greater  than  22,  reverse 
flow  is  always  superior.  This  restriction  is  not  unreasonable,  since  our 
effort  is  directed  at  large  adaptive  arrays. 

Special  Case 

As  already  mentioned  an  0(n)  system  can  determine  the  internal  weights 
o 

as  quickly  as  an  0(n  )  system  if  the  processors  are  twice  as  powerful  or 

doubled  up.  The  process  of  applying  the  transformation  to  the  steering 

vector  and  then  using  the  unit  vector  or  reverse  flow  method  can  also  be 

2 

performed  on  an  0(n)  system  at  the  same  rate  as  on  an  0(n  )  system  with  the 
same  processor  speeds.  The  method  depends  upon  the  diagonal  collapsing 
(Figure  E-4).  In  this  processor, organization  of  each  row  has  independent 
processors  as  does  each  column.  Therefore,  if  only  one  input  vector  exists 
(one  steering  vector,  K=1 )  then  at  each  stage  no  waiting  is  encountered  for 
busy  processors.  The  following  table  specifies  the  conditions  in  which 
this  method  is  equal  in  time  to  an  0(n  )  array. 


Unit  Vector 

Reverse  Flow 

Sequential 

Never 

K=1 

Full  Pipeline 

K+NK  <  1 1 

K  <  11 

Optimal 

K+NK  <  5 

K  <  5 

The  above  table  shows  that  the  unit  vector  method  is  never  equal  to 

2 

0(N  )  array  times  for  N  of  interest  (N  >  10).  The  reverse  flow  method 
rating  is  independent  of  N,  which  is  desirable.  We  also  know  that  with 
diagonal  collapsing,  the  reverse  flow  method  can  be  used  with  forward  flow. 
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Figure  E-4.  Diagonal  collapsing 
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PROCESSING  TIMING  FOR  BLOCK  AVERAGE  Q(n2)  SYSTEM 

This  appendix  determines  system  throughputs  as  a  function  of  the 
degrees  of  freedom,  N,  the  number  of  samples,  S,  and  the  number  of  steering 
vectors,  K.  This  process  can  be  divided  into  three  subfunctions: 

1.  Calculate  all  internal  weights. 

2.  Pass  S  through  array. 

3.  Solve  for  weights,  using  unit  vector  or  reverse  flow  method. 

We  build  up  the  timing  equation  in  the  above  order  for  three  implementations: 
Sequential  (worst  case) 

Full  pipeline 
Optimal  (best  case) 

We  also  assume  separate  processors  for  denominator  calculations. 

F.l  CALCULATE  ALL  INTERNAL  WEIGHTS 

For  each  implementation  method  we  have  the  following  steps: 

A.  Compute  numerator  and  denominator. 

B.  Compute  reciprocal. 

C.  Multiply  by  reciprocal. 

D.  Pass  data  through,  applying  internal  weight. 


) 


The  last  step  can  be  combined  with  the  first  step,  so  that  as  the  internal 
weights  are  applied  the  outputs  are  used  to  calculate  the  numerator  and 
denominator  of  the  next  row.  The  timings  for  these  basic  steps  are: 


A 

B 

C 

D 

Sequential 

lost 

7t 

2t 

11  St 

Full  pipeline 

lot  +  (S-l)t 

It 

2t 

lit  +  (S-l)t 

Optimal 

4t  +  (S-l)t 

It 

2t 

5t  +  (S-l)t 

For  a  system  of  degree  N  there  are  N-l  internal  weights;  so  the  general 
timing  formula  in  terms  of  stages  is: 

T  =  A  +  (N-l)B  +  (N-l)C  +  (N-2)[DA]  , 

where  [DA]  represents  the  time  to  apply  the  weights  and  calculate  the  numerator 
in  the  pipeline.  The  resultant  specific  formulas  are: 

Arbitrary  S  &  N  S  =  2N 

Sequential  11 SN  +  20N  -  12S  -  9  22N2  -  4N  -  9 

Full  pipeline  SN  +30N  -  S  -  20  2N2  +  28N  -  20 

Optimal  SN  +  17N  -  S  -  13  2N2  +  15N  -  13 

F.2  PASS  S  THROUGH  ARRAY 

This  is  the  same  calculation  as  applying  the  internal  weights  to  K  input 
vectors  (remember  K  is  the  number  of  steering  vectors).  The  time  formulas  are 
as  follows: 

Arbitrary  N,K  K=1 

Sequential  (UN  +  UK  -  22)t  (11N  -  ll)t 

Full  pipeline  (11N  +  K  -  12)t  (UN  -  ll)t 

Optimal  (5N  +  K  -  6)t  (5N  -  5)t 
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After  passing  the  steering  vectors  through  the  array,  the  resultant 
vector  must  be  multiplied  by  all  the  reciprocals  of  the  denominators.  Since 
these  values  have  already  been  calculated,  only  a  real -times-complex  multiply 
must  be  performed,  as  follows: 

Sequential 
Multiply 
Multiply 
Time  2t 
Total 

Time  (TIN  +  11K  -  20)t 

F.3  SOLVE  FOR  WEIGHTS 

We  first  explore  the  unit  vector  (UV)  method.  This  method,  as  explained 
in  Section  2.6.1,  consists  of  passing  N  (or  N-l )  unit  vectors  through  the 
arrays  and  performing  a  dot  product  on  the  output.  This  process  is  similar  to 
passing  S  data  vectors  through  the  arrays,  so  the  same  formula  holds.  Since 
the  steering  vector  (or  vectors)  will  pass  through  the  arrays  first,  we  are 
passing  a  total  of  K+NK  vectors  (N  unit  vectors  for  each  steering  vector). 

The  processing  time  for  dot  product  is  entirely  overlapped  by  the  pipeline 
nature  of  the  array  except  for  one  complex  multiply  and  add  at  the  end.  The 
delay  for  this  last  multiply/add  is  equivalent  to  applying  the  weight  for 
one  input  (lit,  lit,  and  5t  for  the  three  methods  discussed). 


Full  pipeline  Optimal 

Multiply  Multiply 

Multiply 

2t  t 

(UN  +  K  -  10)t  ( 5N  +  K- 5 ) t 


Unit  Vector  Times 

Sequential  11NK  +  22N  +  11K  -  22 

Full  pipeline  NK  +  UN  +  K  -  12 

Optimal  KN  +  5N  +  K  -  6 

Time  for  the  entire  processing  time  can  now  be  determined  by  adding 
these  times  to  the  times  for  passing  through  the  raw  data  vectors.  The 
resultant  timing  formulas  are: 

Arbitrary  N,  S,  K  K=1 ,  S=2N 

Sequential  11SN  +  1  INK  +  33N  -  12S  +  UK  -  23  22N2  +  20N  -  12 

Full  pipeline  SN  +  NK  +  41 N  -  S  +  K  -  31  2N2  +  40N  -  30 

Optimal  SN  +  KN  +  22N  -  S  +  K  -  19  2N2  +  21 N  -  18 

Reverse  Flow  Times 

The  reverse  flow  method  is  exactly  the  opposite  of  passing  the 
steering  vectors  through  the  array,  so  the  basic  timing  formula  is: 

PASS  RAW  DATA  +  2  *  PASS  STEERING  VECTOR  +  COMPLEX  MULTIPLY 

The  extra  complex  multiply  is 'to  perform  the  final  calculation  of  passing 
the  steering  vectors.  The  timing  formulas  are: 

Arbitrary  N,  S,  K  K=1 ,  S=2N 

Sequential  11 SN  +  44N  -  12S  +  22K  -  51  22N2  +  20N  -  29 

Full  pipeline  SN  +  52N  -  S  +  2K  -  42  2N2  +  50N  -  40 

Optimal  SN  +  27N  -  S  +  2K  -  24  2N2  +  25N  -  22 
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Comparison  of  Reverse  Flow  and  Unit  Vector  Times 

We  can  compare  the  timing  equations  for  reverse  flow  and  unit  vector 
methods  to  determine  which  is  faster.  The  results  of  this  comparison  are 
shown  below: 


Reverse  Flow  Faster  If 
Sequential  All  cases 

Full  pipeline  NK  +  11  >  TIN  +  K 

Optimal  NK  +  6  >  5N  +  K 

As  K  approaches  N  the  reverse  flow  method  is  better.  Which  method  is  actually 
implemented  is  determined  not  only  by  K  and  N  but  also  b.y  the  timing  requirements 
of  the  system  to  determine  the  required  level  of  complexity  in  the  processors. 
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PROOF  OF  RESULTS  ON  GROWTH  OF  INTERMEDIATE  RESULTS 


We  first  consider  the  case  of  the  algorithm  without  square  roots,  where 
the  actual  elements  of  the  covariance  matrix  are  passed  through  the  array. 
Since  the  array  does  nothing  more  than  the  Cholesky  LDL*  decomposition  of  M, 
we  will  first  change  notation  to  show  how  the  zl(k)  correspond  to  intermediate 

J 

values  in  the  Cholesky  decomposition,  and  then  with  this  simpler  notation 
prove  the  stated  results. 

Our  claim  is  that 


zk<k>  *  mkk  - 

k  y 

zk(k)  ’  mkk  -  £, 

z>>  ‘  "jk  -  % 

(krVr 

(G-la) 

Vkrdr  =  dk 

(G-lb) 

VVdr  i— k 

(G-lc) 

wkj  ■  zj<k>'zk<k> 

A 

*«-D 

II 

(G-ld) 

where  L  =  {£.. }(unit  lower  triangular),  D  =  diag  { d . } ,  M  =  {m.,1,  and  M=LDL*. 
J  K  J  J  K 

The  proof  is  by  noting  that  the  array  and  the  Cholesky  decomposition  perform 
the  same  column  operations  on  M,  and  that  the  array  recomputes  some  intermed¬ 
iate  values  (because  no  time  is  lost  due  to  paral lei  ism, and  interprocessor 
communication  is  simplified). 

We  may  now  show 


\»>n  i  zk(k)  "  "kk  -  t  Wdr  i 

r=  i 

i 

\»in  i  Zk(k>  '  mkk  -  £lkr*krdr  i  ” 

r=l 


max 


max 


(i<k) 


(G-2a) 

(G-2b) 


where  X  . 
min 


=  minimum  eigenvalue  of  M  and  m 


max 


max  |m.  . I . 

n  ,J 
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We  need  several  well-known  facts,  m  =  maxim.. I  since  our  matrix  is 

max  .  n 

positive  definite  Hermitian  [Isaacson  and  Keller  1966].  The  separation 

theorem  [Franklin  1968]  states  that  if  M  is  an  nxn  positive  definite  Hermitian 

matrix,  and  if  M„  is  the  mxm  matrix  consisting  of  the  first  m  rows  and  n 

columns  of  M  with  eigenvalues  X™  <_  X™  <_  . . .  £A™,  then  the  xj  satisfy 

X*  <  A™-1  f.  X™  <_  . . .  X™~ij  1.X™.  In  other  words,  the  m-1  eigenvalues  lie 

in  between,  or  separate  the  m  eigenvalues  of  the  larger  matrix.  Also,  if 

m  m 

M=LDL*,  D=diag(dj),  then  det  Mm  =  j{Z|  X™  =  d  ^ ,  the  equality  of  which 

follows  from  the  fact  that  det  M  =  det  D  =  and  the  Cholesky  decomposition 
of  M  produces  the  same  and  d.  values  for  i  <  m.  Finally,  to  prove 

m  ij  i  — 

Eq.  (G-2a),  we  write 


m 

d  =  Ji.  d. 

m  J  =  1  J 


/  m-1 

/  J]  d.  =  det  M  /det  M  , 
/  j-1  j  m'  m-1 


m 

n 

j=i  j 


m-1 

.n  \ 

j=i  J 


m-1 


so  since 


,m-l 

Aj 


<  xm 
-  Aj+1 


>  x; 


mm 


The  other  half  of  the  inequality  follows  by  our  noticing  that  the  defining 
equation  for  d.  (Eq.  ( G- 1 b ))  starts  with  m..  <  m  and  subtracts  positive 

K  KK  ITIcLX 

p 

numbers  1st,  Id  (d^  >  X  .  >0  since  M  is  positive  definite).  Eq.  (G-2b) 

i  |<ri  r  r  —  min 

1/  j 

follows  from  (G-2a)  since  X  .  <  Z. (k)  <  Z . ( k )  <  m..  <  m  ,  and  the  sum  for 

min  k  k  KK  max 

k  i 

Zk(k)  subtracts  more  positive  terms  than  the  sum  for  Zk(k).  That  these 

bounds  are  sharp  is  obvious  by  considering  any  diagonal  ’matrix. 

★ 

J.  N.  Franklin,  Matrix  Theory,  Englewood  Cliffs,  New  Jersey, 
Prentice-Hall,  1968. 
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Next,  we  show 


|Zk(k)|  <  m 

1  j '  ' 1  -  max 

(G-2c) 

|Z-(k)|  <  2m  ( i < k ) 

1  j  ' 1  —  max  '  ‘ 

(G-2d ) 

Let  *kr  =  *kr  ^  50  that  dk  *jk  '  Zj(k>  ■  "jk  -  S  V  ?jr  O«0)  so 
Zk(k)  -  </Ty  t.k)  =  ekk  k.k.  Since  mkk  =  glij2  and 

J  ~  O  ~  r—  ~  _  |< 

m..  =  V  I  £ .  I  we  have  2..  <  /m  and  2..  <  /m-  so  |Z.(k)|  <  m  , 
jj  ^ 1  jr1  kk  max  jk  max  1  y  n  -  max’ 

•  k-1  „  -  .. 

proving  Eq.  (G-2c).  Also  | Z j ( k ) |  =  |mjk  -  ^  2kr  2jr|  =  |mjk  -  (kth  row 

of  L  up  to  i-1,  jth  row  of  L  up  to  i-l)|  |m . k  |  +  |(kth  row  of  L  up  to 
t  h 

i-1,  j  row  of  L  up  to  i-1)  |  (where  (•,♦)  is  a  complex  dot  product) 

<  mmax  +  | |kth  row  of  L  up  to  i-1 1 |  •  | | jth  row  of  L  up  to  i-1  | | 

(by  Cauchy  Schwartz,  where  ||  •  |j  =  vector  norm) 

-  mmax  +  I lkth  row  of  L| |  •  | |jth  row  of  L | | 

nr—p  rj— t 

=  m  +  /£  |2,|  f  |t  I 

max  kr  Jr 

=  +Jm. ,  ..  finTT  <  m  +  m  =  2m  , 

max  \  ik\  jj  -  max  max  max 

proving  Eq.  (G-2d).  To  see  that  Eq.  (G-2c)  is  sharp  and  (G-2d)  is  sharp 
to  within  a  factor  of  2,  consider  the  matrix 
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a+b  a-b 


L  a-b 


a+b 


for  0  <  b  <<  a . 


Finally,  we  prove 


max 


*jki  = 


(G-2e) 


mm 


Since  mk(<  =  £  U.kr!  we  have  |£kr|  <  ^ 


max 


Also,  |.  I  ■  |tjk 


mm 


since  d,  >  A  .  .  To  see  that 
k  -  mm 


this  bound  is  sharp  to  within  a  factor  of  2,  consider 


1 

2 

i 

a 

1 

0 

1 

2 

0 

1 

a 

3 

_ 

a 

1 

j 

2 

a 

1 

0 

1 

0 

1 

for  large  a; 


so  >  .  «  — j  ,  m  =2,  Pj^.  z;  2a. 
mm  o  2  max  4/  A  . 

2a  j  mm 


Next  we  consider  the  version  of  the  algorithm  without  square  roots 
when  sample  vectors  arc  used  to  compute  the  w^'s.  We  denote  the  inter¬ 
mediate  results  by  Z'^(k)  to  distinguish  them  from  the  Z 1 ( k )  above,  when 
the  actual  matrix  was  used.  We  interpret  the  inputs  Z 1 ( k )  as  Gaussian 
random  variables  with  zero  means  and  covariance  matrix  M,  and  compute  the 
distributions  of  the  z|(k)  which  are  functions  of  the  Z^(k).  The  actual 
distributions  of  the  Z ^ ( k )  are  too  complicated  to  compute  exactly,  but  we 
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make  the  approximation  of  replacing  the  w..‘s  by  their  means  as  soon  as 

^  J 

they  are  computed.  When  the  number  of  samples  is  large,  this  is  an  ex¬ 
cellent  approximation.  (For  other  properties  of  these  distributions,  see 
★ 

[Rao  1965].)  It  is  in  fact  possible  to  write  down  the  exact  distribution 
of  the  w.  ,'s,  see  Rao,  p.  508.  With  this  approximation  we  note  that 

*  J 

Z^(k)  is  a  linear  combination  of  the  input  random  variables,  and  hence 
is  Gaussian  with  zero  mean.  If  we  compute  its  variance,  then  we  will  know 
the  distribution  of  its  possible  values  and  hence  how  many  bits  are  re¬ 
quired  to  represent  it. 

We  now  prove 


var[Z!’(k)]  =  zj(j)  (G-3a) 

where  zl(j)  is  computed  as  though  the  actual  matrix  were  being  passed 

J 

through.  Thus  Xmin  <  var[Z'j(k)]  <_  mmax ,  using  the  results  of  the  last 
analysis. 

In  other  words,  after  the  random  variables  have  passed  through  i 

4.  L. 

rows  of  the  array,  their  variances  equal  the  itr  partial  sums  of  the 
expressions  for  d^.  Let  V  be  the  column  vector  of  random  variables,  so 
that  E(VV*)  =  M  and  ECU'1  V)(L-1V)*]  =  L'1  E(VV*)L"1*  =  D.  Let  Lm  represent 
the  transformation  performed  by  the  first  m  rows.  Then 


* 


C.R.  Rao,  Linear  Statistical 
York,  John  Wiley  and  Sons,  1965. 
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var("ij> =  5  (z!<'>  *  u-u)i2) 

“<  '  ("jJ  •  5  1 2jr ^dr ) 


+  m.. 
1J 


i  2  WS 


i-1  _  2 

E  *ir  *ir  <H 

r=l  ir  jr 


(G-3b) 


mn  mi2 


,  it  is  easy  to  see 
Since  linear  combinations 


If  Q-]  and  Q2  are  normal  with  covariance  matrix 
E(Q1Q2)  =  m12  and  var(Q1Q2)  =  m^  m22  +  m?2  m 
of  multivariate  Gaussian  random  variables  are  multivariate  Gaussian,  we  need 
only  find  out  what  the  variances  and  covariances  are  of  the  variables  used  to 
form  w^.  We  approximate  w^.  as  Gaussian  by  the  central  limit  theorem.  They 
are  given  by  (*)  with  m*1,  and  this  is  our  result.  The  1/S  factor  is  used 
to  cancel  one  of  the  S  factors  "concealed"  in  m2ax.  Since  w. .  &  N(y,o2)  with 
I  Vi  I  i  mmax  and  cT  <  2mmdx/S  we  have 


P(|wijl  i  2‘  "W>  *  P<wij  i  2*  "Wx>  *  P<“ij  <  -2‘  ”max> 


p  ,  'OiLLl  >  2  <  :2\ax  +  -max 


=  P 


,  VI  "max 


S  "'max  VS  "'max 


max 


VI 


m 

max 


<  P 


N(0.1) 


2  m _ -m. 


1 


max  max 


VI 


max 


+  P 


■2  m  +m 

N  ( 0 , 1 )  <  - OlH-JM 


max 


N(0,1)  >^|~(2t-l)  +P  N(0,1)  <  -^|~(2t-l) 
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=  P  N(0,T)  >^|  (Zt  -  1) 

~  ? 

w  —  is  approximately  x  with  S  degrees  of  freedom,  mean 
E(w^ . )  =  Z](i),  |E(w..)|  £  mmax  and  variance 


var^.)  <  2  mmax/S  . 


(G-3c) 


Since  E(w.. )  =  var[Z'l(i)],  the  proof  of  the  first  part  is  easy. 

•  J  * 

Recalling  that  the  expectation  of  the  fourth  power  of  a  standard  normal 
variable  is  3,  we  obtain  var(w.. .)  =  2  var2[ Z1 ^ (i )]/S  £  2  rr/^/S. 


Appendix  H 

SAMPLE  VOLTAGE  VECTOR  MODEL 

To  determine  if  the  sample  covariance  matrix  approach  is  feasible  when 
N,  the  number  of  weights,  is  at  least  200,  we  had  to  use  a  model  of  the  sample 
voltage  vectors  to  write  a  computer  simulation  to  test  the  algorithm  and  its 
implementation. 

The  radar  test  problem  is  arranged  so  that  the  interferers  can  be 
specified  by  their  eigenvalues.  To  do  this  simply,  a  linear  antenna  array 
with  uniform  spacing  and  weighting  was  chosen.  With  this  configuration  it  is 
easy  to  form  multiple  beams  and  place  an  interferer  in  each  beam  so  that  each 
beam  output  contains  only  power  from  that  interferer.  Since  each  beam  output 
is  then  independent,  the  covariance  matrix  of  the  beam  output  is  diagonal  and 
the  interferer  powers  are  the  eigenvalues.  It  is  easy  to  transform  the 
problem  to  element  space,  using  a  unitary  transformation. 

To  implement  this  approach,  the  interferers  are  placed  so  that  all 
except  one  are  at  nulls  of  the  beam  pattern  for  each  beam.  The  voltage 
output  of  a  beam  formed  by  summing  all  element  outputs  is  given  by 

X  =  h  +  ei,j;  +J(N-l)ipi  _  Sin  NQ|>/2), 
d  L‘  +  e  ••■+e  -I  sin  <|>/2 

where 

ip  =  2itD  sin  8 , 

0  =  an  angle  measured  from  the  boresight  of  the  antenna  to  a  point 
source,  and 

D  =  the  antenna  element  separation  in  wavelengths. 

If  interferer  positions  are  chosen  so  that  sin  [N(^n  -  =  0  for  n  t  m, 

they  satisfy  the  condition  for  independence.  This  will  be  true  when 
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N/2(i^n  -  i|<  )  =  kn,  where  k  is  an  integer.  The  desired  interferer  positions 
are,  therefore, 

^  ,  k  =  1 , . . .  ,N-1  .  (H-D 

In  element  space  the  voltage  at  each  of  N  elements  is  given  by 

for  the  K  =  N-l  interferers.  The  Ak  are  the  powers  in  each  interferer  and  the 
R's  are  independent  zero-mean  random  variables  with  ! I  =  1-  The  are 

included,  when  needed,  to  simulate  the  interferer  variations  between  samples, 
and  receiver  noise  whose  power  is  Qn . 

Since  the  interferer  positions  are  known,  the  true  covariance  matrix 
can  be  computed  as 


=  X*  X„ 
m  n 


=  1/K  E  ^ L 
k=l  ¥ 


i-i[2TTk/N(m-n)] 


( H— 3 ) 


A  stochastic  sample  covariance  matrix  using  S  samples  can  be  obtained 
by  forming 


M 

m,n 


sXm  sXn 


( H-4 ) 


where  the  voltages  are  obtained  from  Eq.  (H-2). 

With  this  model  the  eigenvalues  can  be  chosen  to  span  any  desired 
range  of  values,  and  the  number  of  interferers  can  be  varied  up  to  N-l  while 
one  specifies  the  eigenvalues. 
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MISSION 
of 

Rome  Air  Development  Center 

RAVC  planA  and  executes  AeseaAch,  dnvzl.opme.nt,  teAt  and 
A  elected  acquisition  pA.0g4.ams  In  suppoAt  o£  Command,  ContAol 
Communications  and  Intelligence  (C3I)  activities .  Technical 
and  englneeAlng  suppont  utithln  aAeas  o£  technical  competence 
Is  pAovlded  to  ESV  PAogAam  Calces  ( PCs )  and  othen  ESV 
elements.  The  pnlnclpal  technical  mission  aneas  oa e 
communications,  electAomagnetlc  guidance  and  control,  sur¬ 
veillance  06  gnound  a.nd  aerospace  objects.  Intelligence  data 
collection  and  handling,  Information  system  technology, 
lonospkeAlc  pAopagatlon,  solid  state  sciences,  mlcroiM-ve 
physics  and  electronic  Aellablllty,  maintainability  and 
compatibility. 
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